January 31, 2020

Telling the story about the serial crisis from the opposite perspective

The serial crisis was mainly told from the perspective of libraries. The plot in short is, that the large capitalist companies like Elsevier have increased the price for the academic journals, and the poor libraries don't know how to pay. What was ignored is the financial situation of the academic publishers itself.

Let us take first a look into mainstream publishing companies. Newspapers like the New york times, and book publishers Penguin Random House have reached it's height decades ago. That means, the business model of selling printed information is outdated and was replaced by the business model of internet company. The result is, that the product of a printed book can't be sold anymore.

Academic publishers have the same problem. In the past, they were focussed on a single product which was the printed academic journal. There was a high demand for this product and the companies have made a big profit. With the upraising of the internet, the demand for printed academic journals has become lower. As a result, the economical situation of academic publishers is bad: I've found at least two reports about bankruptcy and near bankruptcy of academic publishers:

https://www.chronicle.com/article/Publishers-Bankruptcy-Filing/140103

https://www.insidehighered.com/news/2015/01/02/swets-bankruptcy-will-cost-libraries-time-money

Ironically, the libraries and the students are in the strong position, but the academic publishers are in the weak position. The simple fact is, that the printed academic journal is a product from the past, and if a publishing house is focussed on this single product, the probability is high, that it economic outlook is negative. Sure, large companies like Elsevier and Springer are in a good position. They have recognized early, that they must change their bushiness model into the direction of electronic distribution. But many smaller academic publishers doesn't have the ressources for doing so. They will run into bankruptcy or they will merge with larger companies. Basically spoken, all the publishing companies are under pressure.

Let us investigate what an academic publishing house is doing from the economic perspective. It has customer who are paying money for the journals. This is in most cases a research library. This money is take to create printed journals which includes printing, typesetting, technical peer review, grammar review and marketing. If the customer of the academic publishing houses has a smaller demand, the publishing house can't finance it's costs anymore and it will run into bankruptcy. That means, the research library is not the victim of the serial crisis, but it's the academic publisher who is under pressure.

January 29, 2020

Showing Javascript in Blogger.com

According to a stackoverflow post https://stackoverflow.com/questions/6449733/how-can-i-add-javascript-inside-blogger it's possible to embedded Javascript code into a blogger.com blogpost. I have tried it out, and it works great.

https://trollheaven.blogspot.com/2020/01/javascript-test.html

What is shown in the blogpost is a canvas element for drawing pictures, plus a simple form to enter text. I'm not sure, what the limits are but it seems, that basic Javascript programs are running great in the blogging software.

Javascript test

Hello world.





Error, HTML5 canvas isn't working







January 28, 2020

bug nodejs is slow

For testing out the performance of node.js a simple prime number generator was programmed in Javascript and in C as well. The node.js version has reached the half of the speed of the C version. During runtime, the cpu was utilized by 100%. The problem is, that node.js is not a native compiler but was written in C++ which is interpreting the Javascript code during runtime. The result is, that node.js is not able to reach the same performance like C++. For high performance applications the language is not fast enough.

/*
gcc -O2 prime.c
time ./a.out > out.txt
real 0m14.145s
user 0m14.120s
sys 0m0.006s

*/

#include stdio.h
int min=2, max=500000;
int main() {
  int flag;
  for (int i=min;i
    flag=0;
    for (int a=2;a<=i/2;a++) {
      if (i%a==0) { 
        flag=1;
        break;
      }
    }
    if (flag==0) printf("%d\n",i);
  }
  return 0;
}
/*
time node prime.js > out.txt
real 0m26.749s
user 0m26.641s
sys 0m0.127s
*/

var min=2, max=500000
for(var i=min;i
  var flag=0
  for(var a=2;a<=Math.trunc(i/2);a++) {
    if (i%a==0) { 
      flag=1
      break
    }
  }
  if(flag==0) { 
    console.log(i)
  }
}

January 27, 2020

Understanding how Wikipedia works

Instead describing the website from it's own perspective the better idea is to take a birds eye perspective and observe what the role of Wikipedia in a capitalist context is. The education sector which includes universities, private owned libraries and book publishers is operating as a for profit business. That means, a book publishers likes to earn money and a university takes the money from the students to pay the rent of the building. If the management of a book publisher is doing everything right, the company generates revenue and is able to pay all the bills. This is equal to run a successful business.

Some companies, which are less then 10%, are not managed good enough. The financial situation looks bad, and in the past some wrong decision were made by the top management. That means, the company doesn't generate a profit but a loss. And here comes Wikipedia into the game. What Wikipedia is doing is to meet universities and unprofitable book publishers to explain them the benefit of free knowledge. For example, if an academic book publisher goes into bankruptcy, the content which was generated by the employees has no value anymore and can be transferred to the Open educational resources platform. And if a university with a long tradition is no longer able to pay the bills, the former professors and students are angry against their institution and they are invited to contribute to the Wikipedia website with their knowledge.

Basically spoken, Wikipedia is not a successful company but it's some kind of failed project which is feed with negative information. Between a well running academic publisher and Wikipedia there is a large gap and this gap is a good idea, because the book publisher has understand what capitalism is about. The idea of capitalism is to make profit, to earn money and to grow. It's about become rich and famous and be competitive on the market.

To understand what the difference between success and failure is, we have to describe the mechanism of the value chain. A normal company for example a university, gets money from the students and pays the money to the professors. That means that the professor is paid for his work. If the professor is doing a good job and the university is attractive to the students, the company is able to increase it's profit. In contrast, WIkipedia and Open Science in general isn't operating under such constraint. Basically spoken, WIkipedia is an anti-cooperation.

Let us listen, what the Commercial Manager of Swets has to say about the future of academic publishing. Swets was a dutch content broker for libraries which runs into bankruptcy in 2014:

quote: “The pressure is high. We expect more and more from them. Nobody likes to change really. But if you bring change in a positive way, that's important.”, source “Swets Enjoys Change: Changing the view”, https://www.youtube.com/watch?v=M9BgD0FcSCY

Why Wikipedia is wrong

The main problem with Wikipedia is, that the project is not focussed on getting famous and earning money. In contrast to established universities like Stanford and respected academic publishers like Elsevier the primary concern is not to increase the monetary value but to provide information for nothing. That means, Wikipedia isn't a success machine but it's the opposite. It's a community of loosers.

The reason why the Elsevier company aren't contributing free medical content to Wikipedia is because Elsevier has understood how capitalism works. It has to do with earning money, become rich and famous and stay on top of the wave. Wikipedia is doing the opposite. They are doing everything wrong and as a result it is respected by no one. Ask authors to write an article for free and putting high quality content under a CC-BY license is some kind of anti-pattern in academic excellence.

The only way to interact with WIkipedia is to boycott the project. Or at least to stay away from it. If a university is announcing an intensive partnership with Wikipedia this is equal that this university is in trouble. They have no future in the education business and have lost the competition with other universities.

January 26, 2020

Building a simple webserver with node.js



The node.js virtual machine is a practical tool for creating server applications. A simple hello world example is given in the screenshot. To make things more interesting a for loop is used to generate a list which is redirected to the webbrowser. The most advanced feature of nodejs over normal Javascript is, that the program can be tested and improved in the IDE. It's possible to bugfix the function until it works great, and then the webbrowser can show the output from the user perspective.

The workflow for creating sourcecode is very similar to Python, that means, node.js is able to replace the Python language. The only disadvantage is, that the amount of Python tutorials is higher and they have a better quality than the few node.js tutorials. That means, the nodejs community isn't available but it has to built from scratch. To shorten the explanation a bit we can say, that the programming language in 2020, in 2021 and in 2022 as well is node.js/Javascript. That means, the advantages over normal Python is so overwhelming that every Python user will migrate to Javascript in the future. The same is true for C++ programmers, Java programmers and of course all the C# programmers.

To understand the node.js ecosystem we have to focus on failed classical programming languages. The Python language was an early to attempt to build a scripting language which supports object oriented programming. Before Python was there, object oriented programming was the same as compiled programming with C++. The perl language was known, but it wasn't supporting classes. Python has replaced Perl entirely.

But Python struggles in the performance. The Python virtual machine was always too slow. It was not able to compete with Java or C++. So the community was divided into users how are preferring Python and other who not. We can say, that node.js has solved a lot of problems. It contains of the following features:

- the performance is very fast

- it is a scripting language with fast edit-run-cycles, no need for compilation

- it runs under different operating systems with the same code

Basically spoken, nodejs combines the advantages of Python, Java, C++ and PHP into a single language. Similar to Python it can be mastered by beginners. Similar to Java it runs everywhere, similar to PHP it is very good in creating websites and similar to C++ it is very fast. The prediction is, that nodejs will replace all these languages. Or let me explain it the other way around. It's not possible that a node.js programmer will switch back to PHP if he is familar with Node.js already. That means, there is only one option towards node.js but not the other way around.

January 25, 2020

Node.js for Python programmers



The main reason, why Python has a large amount of users is because it combines a prototyping language with object oriented programming. Similar to C++ and Java the user can create objects and classes which allows to realize more advanced projects compared to purely procedural programs. On the other hand, Python is more easily to learn than C++ because the syntax is a high level one. The combination of both feature explains most of today's widespread usage of the Python language.

What many people doesn't know is, that node.js and Javascript is the better Python. Similar to Python it can be used as a prototyping language. An easy example is the canvas element in HTML which allows to program graphics and even graphics animation. At the same time, Javascript is capable of object-oriented programming which allows to create large scale apps. It's main advantage over Python is, that the underlying virtual machine is really fast. It outperforms Python easily.

It's not very hard to predict, that Javascript will become the most successful language which will get used by more users than Python, PHP and Java combined. The only language which can't replaced by Javascript is Forth. Forth is a different language which is more powerful than Javascript but more complicated to learn. The reason is, that Forth can be realized on different machine architecture and provides it's own operating system. This is not possible with Javascript.

The reason why Python but not Javascript is widespread used in the year 2020 has historical reasons. Javascript and especially node.js are very young projects. The 1.0 version of node.js was released in 2010 and most programmers have decided to ignore it because they are familiar with classical back end language like C++, Java or Ruby. The comparison between classical language was focused on the question if compiled or modern scripting languages are the prefered choice. C++ programmers are convinced that only compiled languages are providing the maximum performance, while Python developers emphasize the advantages of an interpreted easy to learn language. The discussion about the pros and cons was easily because both paradigm had clear features. Python is a slow language, but can be written fast. While C++ is complicated to learn but can be executed fast.

The node.js framework is something which outperforms both languages. It's easier to program than C++ and it's faster then Python. This makes node.js the perfect choice for all applications. Additionally it runs under all operating systems, can be used for backend and frontend development, supports object oriented programming and has a large amount of libraries. The only thing what is missing in the node.js ecosystem is a long history and reference handbook which are introducing the subject to a larger audience. What we see today are some quickly created examples of Javascript code distributed over the internet. The result is, that the average user thinks, that Javascript isn't a real programming language but something which can be ignored.

From a birds eye perspective, node.js is the successor to Python 3. The virtual machine was programmed more efficient and it runs on different operating systems. The main feature is, that Javascript is used for creating productive code. That means, it's a prototyping language and a practical language at the same time. There is no need to rewrite existing Python code in C++, but the same Javascript code is used at the production server. This simplifies the programming workflow.

From the technical perspective it's pretty easy to write a hello world program in node.js. All what the user has to do is to type in the sourcecode into the normal programing IDE and configure the execute button with the nodejs interpreter. A click on run will work similar to execute a python program. That means, no webbrowser is needed, and if an error is there is will be shown in the console log. The difference between Python sourcecode and Javascript is minimal. The user has the choice if he likes to introduce functions or complete classes into his project. He can create a GUI application in html javascript or he can decide to write a GTK+ application for the normal desktop environment. That means, nodejs can be used outside the context of web-programming very well for normal desktop applications. If the user likes he can create additionally complex LAMP applications which are utilizing an SQL server and multithreading. But newbies can start with a normal logo graphics project as well.

The good news is, that in the past it was tested by different user how nodejs performs in comparison to Python 3. The answer is, that nodejs is 20x faster than a Python 3 program, https://stackoverflow.com/questions/49925322/significant-node-js-vs-python-3-execution-time-difference-for-the-same-code That means, nodejs is at the same speed like C or even a bit faster. And the examples measures only the cpu performance not the performance of a webserver which is working with parallel threads. The advantage of nodejs is here much more visible.

A personal tiobe index

The original tiobe index with the most famous programming languages is given here https://www.tiobe.com/tiobe-index/ My personal ranking looks different:

1. AutoIt. A macro language used for Windows scripting. The Autoit community is building aiming bots for the purpose of leveling and harvesting videogames.

2. Forth (the most powerful programming language ever invented, it's widely used by expert programmers and difficult to learn)

3. Javascript (the nodejs runtime engine has introduced Javascript to a wider audience and can replace outdated languages like PHP and Java)

4. PDDL, a planning language for building advanced robots

The Commodore 64 lives forever

Sometimes it was argued, that the Commodore 64 homecomputer wasn't a big innovation at this time. Other computers for example the Apple II was more impressive, and the Unix system V operating system was superior over the Commodore 64. But can we compare the C-64 only by it's technical specification? It's important to mentioned the complete picture. Most users are not interested in the machine itself, but they are were fascinated by the additional equipment.

In the basic form this is equal to some games, a device for making punches into the floppy disc and a joystick. But the most interesting equipment for a Commodore 64 was the books published in that time. If we are taking a look into the journals and printed books of the 1980s we will perceive a surprisingly powerful community. There were books available how to program text adventures in Basic, how to use the Geos software, how to make music with the SID chip and even graphics programming in Assembly language was explained in the books.

What is often ignored is, that it's possible to read such books without using a real commodore 64. That means, the books but not the 6510 cpu was the most fascinating part of the Commodore 64 legend. These books wasn't available before the Commodore 64 was realized. They are discussing subjects like a database, game programming and mathematics from a certain standpoint which was the beginning of the home computer revolution.

Another interesting fact was, the books about the Commodore 64 were available in normal bookshops and even in public libraries. The 1980s was the first decade in which such literature was produced.

The future of Open Source licenses

There are two major Open Sources licenses available, the MIT License and the GPL license. The advantage of both licenses is, that the user get access to the sourcecode. He can read the code and he can compile the binary file from scratch. The disadvantage is that problems are upraising if the end user forks an existing software project. Let us go into the details.

In the easiest case, the enduser is interested to download, read and execute existing sourecode. This is possible with the MIT License and the GPL license very well. Both licenses are written for exactly this purpose. The created code can spread freely over the internet and the costs for the end user is low.

The bottleneck is visible if the end user tries to do more with the sourcecode than only execute it on his computer. This is called forking. Forking means, to build on top of the sourcecode a new program. In the domain of software engineering, a library is linked into the own program and then the new software is distributed. If the end users is planning to do such things, he will run into a lot of trouble.

This is the case of the MIT License and the GPL license as well. The reason is, that every piece of code was written by a person, and the person holds the copyright on this code. Suppose, the idea is to use the libc library in the own project. The libc library is copyright protected. That means, a person in the world holds the copyright. And what is allowed and what is forbidden with the code is defined by this person. In case of the libc, the LGPLv2.1 license is valid https://en.wikipedia.org/wiki/GNU_C_Library Other library are available under a MIT license. What will happen in any case is, that after the library was used in a new project, the original author will check if the new project is fulfilling the license. That means, the enduser is not really free, but he has to negotiate with the license holder.

Suppose, the enduser is not interested in doing so. Suppose the idea is to fork an existing project and do not talk to the original author. Unfurtunately, such a software license is not available. No matter if the project was licensed under MIT, GPL or anything else, in all the cases, the copyright is reserved for the original author. The reason is, that it is a demanding task to write software. If the sourcecode contains of 100k lines of code, many manhour were invested in the past. And the copyright is protecting this invested time.

The only option for the end user to become independent from the original author and the Open Source license is to reinvent the code from scratch. That means, the user has to program it's own libc library from scratch which doesn't use the original sourcecode. That's the reason why the SCO vs. Linux case was openend a long time ago. Reprogramming software from scratch is the only option if somebody needs completely freedom from the original author.

The problem is, that even with modern software development tools like compilers and well documented sourcecode it's a demanding task to reprogram a software project from scratch. In a science fiction movie, the programmer would start a code generator which will produce it's own version of the Linux kernel. That means, the code generator gers some constraints as input and it will produce software from scratch. The software is different from existing sourcecode, so it's not copyright protected. This gives a hint how the future of Open Source will look like.

Today's licenses like the GPL license are protecting fixed code stored in files. The more advanced technique is to develop code generators who are able to produce unlimited amount of code. The generated code is new and it's not protected by the copyright law at all. This can be imagined as an advanced level generator in the Mario AI challenge. A level generator is able to produce a meaningful map from scratch which is different from any level created before. This newly generated game-map isn't protected by a software license. The problem with the Linux kernel, the glib library and most other Open Source projects is, that tha code wasn't generated automatically but it was typed in manually, and therefore it's possible for the original author to protect the code with a software license.

January 24, 2020

Node-js with gtk plugin



Installing the nodejs programming language is surprisingly easiy in fedora. A simple “sudo dnf install nodejs” will make the job. Additional packages can be installed with “npm node-gtk”, which is not recommended from the official fedora manual https://developer.fedoraproject.org/tech/languages/nodejs/modules.html but on my local PC it works.

What the user gets in return is a python like envirionment for creating easily GUI applications. But it's not normal python code, but it was written in node-js. The advantage is, that the underlying just-in-time compiler is more efficient. It has the same – or even a better – speed than C code and it can outperform Python programs easily. It's not very hard to predict, that nodejs is the next big thing in programming, or it has even reached it's height, but some programmers have ignored the situation in the past.

The full potential of nodejs is that it can be easily combined with existing browser plugins, for example threejs, webgl and similar projects. This allows to write operating system independent software. So we can say, that nodejs is the better Java? It's unclear how to define the language, becaues it's very new. In contrast to most of the other languages like C++, Java, Python and C# the amount of literature is small and many things are changing. What we can say for sure is, that serious security bugs are available, https://linuxsecurity.com/advisories/fedora/fedora-31-nodejs-fedora-2020-595ce5e3cc-12-08-57

What makes nodejs so interesting is, that it tries to reinvent the wheel at many places at the same time. There are tutorials available in which PHP programmers are teached to switch to nodejs because it's the better programming language. Ok, many languages are arguing in this way. But there are also books available in which Java programmers are educated that nodejs is a here to stay. The same is true for C programmers, Python programmers, go programmers and so on. Basically spoken, nodejs promis to become the better alternative for all programming language, except Forth ;-) That means, nodejs isn't working with the inverse polish notation which makes it a poor choice for professional programmers, but this is the only disadvantage available.

January 23, 2020

Creating DOI spam in Wikipedia

Hello world,

today i'd like to show how to use a bibtex converter for insert DOI spam into the wikipedia. What we need as input is a long bibtex file which contains a lot of bibliographic references.



This file is copy&pasted into the Bibtex-converter which was written in the famous Javascript language.



The resulting text is copied&pasted into the sandbox.



Is node.js the hidden champion?

The javascript language is used routinely in the Internet, but compared to other programming language the public awareness is a clumsy. A detailed look into node.js shows that it's surprisingly powerful concept. According to Stackoverflow, it outperforms the C language easily, https://stackoverflow.com/questions/27432973/why-is-this-nodejs-2x-faster-than-native-c/30058978

And it outperforms also old school website programming tools like PHP. The wordpress software is the most widespread blogging software in the world. A possible replacement, called Ghost, was written in node.js and according to the user it's easier to install and has a better performance. So let us analyze the facts: Node.js is better than C, and node.js is better than PHP. Does this mean, that both languages are obsolete? It's to early to answer the question, but it seems, that the project should be taken seriously.

A possible approach to replace a python gui app with node js is described here https://www.npmjs.com/package/node-gtk#example it is based on the node-gtk library and the sourcecode for the hello world app is only 11 lines in total. The only difference to python is that node.js needs normal curly brackets.

January 18, 2020

The old debate about which software license is more free

In the development of open source software licenses, it was sometimes questioned if the GPL license is really the most open license. Sometimes the BSD license is called more open because it doesn't force the other side in doing something.

The short answer to the conflict is, that only the GPL v3.0 license is the most open license available. If somebody likes to get more rights, he has to reverse engineering software from scratch. This allow him to become the owner of the code. Let us go into the details.

Suppose somebody takes the Linux kernel which has a GPL license, puts the sourcecode into the own closed source project and sells the software on the market. This is a clear violation of the GPL license and the user/company will have many problems. Critics of the GPL license argue, that this example show, that GPL is not really a free license. What they forget is, that with a MIT license the same problem is there. The MIT license works similar to the GPL license with ownership. That means, user1 creates the software and he holds the copyright. IF user2 tries to do something with the software he may violate the copyright.

To overcome the conflict, user2 has to create the software from scratch. That means, the sourcecode of user2 needs to be different than the sourcecode written by user1. Let us construct an example. User2 takes the Linux kernel. He reprogramms the software from scratch. He is not using the C language, but the C++ language for doing so. The resulting sourcecode is not the same like the original Linux project. So the original GPL license is no longer valid. User2 can choose any license he likes for the C++ software and he is allowed to use it in a closed source project.

Basically spoken the bottleneck is not the license agreement which is formulated in the GPL license, but the bottleneck is the question if user1 and user2 is using the same software. If the software is different, user2 can choose a new license. To understand the situation we have to analyze the technical aspect of coping a software. The easiest way in doing so is to use the Unix cp command:

cp file1.c file2.c

If file1 was licensed under the GPL license, then the file2 can be used only with restrictions. That means, it's possible to use the file2.c in a wrong way and a copyright violation may be the result. To overcome the issue, a more powerful tool than cp is needed: Unfurtunately, there is no computer program available which can convert C-code into C++ code. But if a company starts a project to reprogram given c code into C++ code manually, the resulting file2.cpp is independent from the original one:

file1.c -> manual reprogramming -> file2.cpp

The user can do with the file2.cpp what he likes. No matter which license was used for file1. The only thing what is important is, that file2 is very different from file1. That means, it should be a different programming language, different subfunctions and also a different GUI interface. This is similar if an artist redraw an existing photograph with a pencil. The handdrawn image can be licensed from scratch.

Risks of software licenses

The debate around Open Source licenses contains of theoretical explanation plus the real world scenario. In most cases only the theoretical side is discussed. For example, somebody may ask if GPL or the MIT license is more open. The focus on theoretical definitions is not enough, because in the reality the world is working quite different. If somebody tries to minimize the risks of copyright violations the easiest way in doing so is:

1. use a gpl licensed software as template, for example the latest version of the linux kernel

2. reprogram the software from scratch in a different programming language and with some modifications

3. give the newly created software any license you want

A possible copyright dispute is only the case if software1 is equal to software2. if both projects are using different sourcecode it's a different project and the license is different. If the step #2 (reprogram the code) is missing it's possible that in the reality some problems will be there. For example, reusing a GPL licensed software in a commercial context will produce a gpl violation and re-using a mit licensed software will generate also some trouble with the copyright owner. The reason is, that each piece of software is linked to an author. If user1 has created the software, he can define what user2 is doing with the code. The only option to real freedom is not the GPL license, but it's a situation in which user2 reprograms the code from scratch. This will produce a new copyright which is owned by user2.

The question left open is, under which constraint software1 is different from software2. For example, if in the code only some comments are different, than the software2 is not different. But if it's code rewrite from scratch in a different programming language and with different features, than the new project stands by it's own and it can be licensed from scratch.

How to get in conflict with the Creative Commons license

The creative commons license is described in theory and in the reality very different. In the theory, Creative commons licensed images and texts can be modified by anybody, which includes forking of an entire project. According to the theory, a user takes a creative commons photo, puts this in his own version control system, adds some additional colors to the photo and the resulting photo is used in the blog.

The surprising fact is, that this theoretical workflow can't be realized in the reality. The problem is, that Creative commons is not the same like free the content from copyright, but it will start a conflict between the original author of the photo and the user how has reused the image. What does that mean? Suppose the ininformed user has put the modified photo on his homepage. Then the original author will take a look at it and he will check carefully if the Creative commons license is fulfilled. In most cases, the author will come to the conclusion that the secondary user was not allowed to use the photo specifically in a commercial context. The result is conflict between both parties. So creative commons has not simplified the situation but it will produce new problems.

Let us define for which purpose Creative commons and the Gnu public license is a great choice. It's for the read-only mode. If a content is provided under a creative commons license, this will ensure that no paywall will protect the content. That means, CC-BY is equal to “the world has access for free”. What CC-BY and similar licenses (which includes the BSD license) are not solving is the problem of forking content. Which means, that the world is allow to take the content and use it for their own purpose.

The problem is not located in the license itself, but in the copyright law in general. Creating content which is more free than CC-BY-SA content can be realized with neural networks which are producing the content from scratch. Instead of copying existing information the idea is to use a generative grammar which will produce a new kind of work. This new content is not a derivative of the original work, but it's produced by a computer software.

What the user has to proof is, that his picture on his own website is a different picture which was licensed under the CC-BY license. If the pictures are different, then the copyright law is no longer valid. That means, the original author has as no control over the second picture. The second picture can be used for any purpose, which includes commercial applications, forks or whatever.

Basically spoken, CC-BY and the GPL license is protecting content which is the same. if picture1==picture2 then the CC-BY license can be applied. That means the author of picture1 can dictate the rules to the owner of picture2.

Let us describe the situation from a technical perspective. There is file on the harddrive called “image1.jpg”. The author of the file has tagged the file with a CC-BY 4.0 SA license and uploaded it to the internet. A second user uses the “wget /remote/image1.jpg” command to download the file. He opens the picture in the gimp software and puts his individual logo on the picture. The resulting image2.jpg is uploaded to the blog. Such a workflow will produce a lot of work for the lawyer. Even if the original picture was tagged with CC-BY 4.0 license and even if the blog of user2 has the same license (CC-BY 4.0) it's not clear if a copyright violation was there or not. In case of doubt it was a copyright violation. Not because of the CC-BY license but because image1.jpg and image2.jpg is the same.

The reason is, that a unix command like “cp or wget” is producing an exact copy. That means all the pixel information are the same. And pictures and texts as well are created by someone first. He owns the information no matter which license he has choosen for the content. The answer to the problem is, to replace the “cp” command by something which is not producing a copy but something else.

A very simple example for creating an image from scratch is to manual redraw the outline of the picture with a pencil. This is what artists are doing. They take a photograph as template, draw the lines with a pencil and then the colors are put into the drawing. The resulting image2.jpg has nothing in common with the original but it's an image created from scratch. That means, it's not important under which license the original content was licensed, but the image2.jpg can be licensed by the artists by himself.

How to overcome the Creative commons license

On the first look, modern licenses like Creative commons, GPL and the MIT license are a here to stay, because they provide more freedom to the user. From the standpoint of spreading information into the world, these licenses have made a great job. Today's it's possible to download the latest Linux version for free and get access to lots of Creative commons pictures in the internet.

The limits of Open Source licenses are visible if someone tries to use the content in it's own product. According to the license itself, it's allowed in doing so. But in the reality nobody tries to do so, because it's a copyright violation. Let us make a simple example. Somebody downloads an photo from Wikipedia and puts the photo on the own homepage. The result is, that he is reused a copyright protected image. The first thing what will happen is, that WIkimedia will recognize the case. They have a special subpage in which all the Wikipedia content is tracked in the Internet. The second thing is, that the user gets perhaps an e-mail to not doing so in the future.

Somebody may assume that WIkipedia doesn't have understand the meaning of the Creative commons license, but it's a general problem. Suppose somebody ignores Wikipedia completely, and tries to re-use software which has a MIT License. That means, he downloads the sourcecode and then he puts the code onto his own homepage. The result is very similar to creative commons licenses. In theory it's maybe allowed in doing so, but in the real world it will create a lots of problems.

The real bottleneck is the not GPL, creative commons or MIT license, but it's the copyright for content. If somebody creates a text, and image or sourcecode from scratch he owns the copyright on this infomration. If a second person is using the content for it's own purpose, it's a copyright issue. The good news is, that it's pretty easy to overcome the problem. In short, a Generative Adversarial Network (GAN) can do the job pretty well.

What does that mean? The first important fact is, that the GPL license, CC-BY and the MIT License are not as free as expected. They are protecting copyright protected information and as the result, it's not possible to modify the content free of any rules. The more elaborated way in handling content is to create it from scratch every time. If content1 is different from content2 no copyright law at all is needed to protect it. Let me give an example: somebody creates an Generative Adversarial Network algorithm which takes as input the images from the WIkipedia. After the learning process, the algorithm is able to generate lots of new images from scratch. They will look different from the original one. That means, these new images are not protected by the Creative commons license, but the user has created the content from scratch. Therefor he is allowed to use them in any possible way.

The conclusion is, that the limits of creative commons license can be overcome easily with modern technology which is a neural network for generating the content from scratch. This will make any copyright regulation (no matter if classical or modern Creative commons licenses) obsolete.

January 17, 2020

Forking a Wikipedia article

Instead of putting the fork online a more easier to explain version is hold only a local copy. The advantage is, that in such a case no copyright problems can occurs, because downloading information is always allowed, only the upload of information can produce trouble.

In the following tutorial a local fork of the AIMA article https://en.wikipedia.org/wiki/Artificial_Intelligence:_A_Modern_Approach is created. Because it has to do with Artificial Intelligence and the article is not very long, so it's a great choice for experimenting a bit. The first thing to do is to create in a working directory a git project:

git init

Then the AIMA article from 2004-03-31 (an early version of the stub) is copied into the folder and the commit is created.

cp /remote/AIMA-article2004-03-31 aima.txt

git add --all && git commit -m "init"

The file doesn't has a date as a name, but it's simply call aima. The different versions are tracked by the git tool, but not by the programmer. Right now, the git repository contains of a single file which was downloaded from the Wikipedia server.



It's time for forking the file. This isn't done by creating a new git branch, but the fork is managed by the user himself. This provides a greater control of the fork and the upstream version:

cp aima.txt aimafork.txt

gedit aimafork.txt

The aimafork.txt file is edited by the user, he adds a new paragraph with criticism about the book. The user writes down, that the book is too expensive for the normal student.



In the meantime, the users in the Wikipedia project have updated the AIMA article. They are not aware, that in the local fork a new chapter was added, but they are following their own strategy. The updated upstream version is copied into the working directory and overwrites the previous version. It's important that before doing so the git commit command was executed. So that it's possible to go back into the timeline to a a previous point.

git add --all && git commit -m "add chapter criticism"

cp /remote/AIMA-article2004-03-31 aima.txt

git add --all && git commit -m "new upstream version"

The open question is how to merge the upstream version with the forked-version? Merging is equal to concatenate different files:

cat aima.txt >> aimafork.txt

git add --all && git commit -m "cat upstream to fork"

gedit aimafork.txt

git add --all && git commit -m "clean up the fork"



After the upstream and the fork file are combined and all the sections are ordered, the aimafork.txt file is in a great condition.It contains all the latest information from the upstream plus the extra chapter written in the fork. The procedure is repeated over and over again. Which means:

1. download the upstream version into the local repository

2. cat the upstream to the fork

3. clean up the fork

4. improve the fork with new sections

What the user is doing is to maintain two versions in parallel, he has access to the upstream version and to his local fork at the same time. The self created fork is more advanced than the upstream version because it contains of more information. It's the same article about the AIMA book but which was improved by a single user.

How open is the GPL license?

From a theoretical point of view, the Gnu public license is well documented and well known in the public. In contrast to a proprietary license the sourcecode can be redistributed by asking anyone, and it's possible to use the code in the own project if the new project contains of the same GPL license. A possible discussion would read the following way:

user1: Can i copy the Linux kernel and improve it by myself?

user2: Why do you ask?

user1: Because this is not allowed with proprietary code.

user2: I'm not an expert for law, but the GPL license is used by the open source community because it's superior to the plan license which prevents code-redistribution.

This is – in short – the discussion style used by open source advocates. They are proud of the GPL license because it gives them and the world the freedom to share and reuse written code. It's surprising to know that the reality looks a bit different. The legal aspects of coping software are not defined in the literature but what is happen in reality. How many people have copied the Linux sourcecode to their own computer? Not that much. Around the world less than 5 million PCs are running LInux. In contrast, billion of PC are running with Windows. So from a legal aspect, Windows is more legal than Linux. But suppose all the Windows user are not informed, that they can install Linux too. So we have to ask, how many forks of the LInux kernel were created in the past? A fork is a concrete application of the GPL license, it means to use existing sourcecode in the own project. The answer is, that not a single Linux fork is available.

Four years ago, Matthew Garrett (which was involved in the Kernel programming team) has talked about forking the Kernel https://www.zdnet.com/article/matthew-garrett-is-not-forking-linux/ but he hasn't do so. In other projects which are called a fork, a mostly not a real fork, but it's only a patch, which means extra software which can be installed as a plugin to the kernel, similar to writing a mod for a computer game. So the funny question is, why was the Linux kernel never forked? Are the software developers are not informed about the GPL license which explains that this is allowed?

Exactly this is the problem. In the theory, the GPL license allows the programmer to fork the kernel. That means to copy the sourcecode onto the own server and invite other people to modify the content. In the reality, nobody is doing so.

In contrast, normal propiatary software for example BSD Unix is forked very often Apple for example is based on the BSD kernel. It seems, that the GPL license results into a situation in which nobody will fork the code, especially because it's allowed in the license. But what the reason behind it?

Suppose somebody takes the Linux kernel as fulltext, puts it into a git repository and make some changes on it. Then he explains that the fork is licensed under a GPL license and everybody is allowed to download and change the code. The funny thing is, that this user would be the first one in the world who is doing so. That means, it's not sure what will happen next. if Linux is licensed under a GPL License why is no Linux fork available? Open Source activists are explaining to the world that Linux is the future. Ok, which Open Source activist group is powerful enough to fork the Linux kernel and show in reality what will happen?

How open is the GPL license?

Let us explain what the problem with a fork is, if it's done with normal copyright protected software. Suppose somebody takes the sourcecode from the Microsoft windows ooperating system, copies it into a github folder and invites other people to send a commit. The other user will clone the repository, create patches and commit it back to the github folder. From a technical side such a project is a great idea. That means, the programmer will improve the code a lot, fix all the bugs and the end user can download the next release.

Everybody who is familiar with the history of the computer industry knows what the real problem is. It's not the technical problem of create commits and push the result to the server but the problem has to do with the license, the copyright law, and that forking of copyright protected content is a form of redistribution. It's not very complicated to predict that forking the Windows sourcecode will produce a lot of trouble in theory and in reality as well. That is the reason why nobody is doing so.

And now let us imagine the same situation for the Linux kernel. Everybody knows that Linux is a different project. So it's allowed to create a git project publicly and commit changes to the server? I don't know. From a theoretical point of view, it's allowed because this is explained in the GPL license. But in reality, nobody has tried to do so in the past, so it's not clear if it's allowed or not. What we can say for sure, that coping the LInux sourcecode to an online repository and invite other people to commit patches is equal to fork a project. Forking is something which is discussed by the copyright law and if it's not allowed it will produce a lot of trouble.

January 16, 2020

Technical experiments wiki-pov fork



The precondition for a successful fork project is, that the fork gets all the patches from the upstream projects. Otherwise the branches are of sync. At the same time, the goal is to not sync the branches so that it make sense to create a fork which looks different. This sounds a bit complicated so let us go into the details of using the git tool for merging different branches.

The first attempt in using only 2 branches was not succesful. The idea was, that the upstream is copied in to the master branch, while the fork is edited in the issue1 branch. The first merge was working fine, but after the second merge some conflicts are the result.

The next next was to use three branches: upstream, master and issue1. This is working much better and it like to explain the idea. The first thing to do is to initialize in a working directory the git repository:

git init

git branch issue1

git branch upstream

git branch

On the screen there are three branches available in which the user can edit. In the upstream branch the snapshot from the wikipedia website are stored. The file article.txt hold the current version which is updated once a month. The upstream is merged into the master branch, and then it's merged into the issue branch. In the issue branch the fork can be edited. Now, the next version of the upstream version is stored in the upstream branch.

And now the magic happens, the user switches to the master branch and executed the following statement:

git merge upstream

git merge issue1

What the git software is doing is to combine the latest upstream version with the fork into a new file. The resulting article.txt contains all the improvements from Wikipedia but it contains also the updates from the fork.

I know, the overall procedure is very complicated because the user has to type in many commands into the terminal. So the prediction is, that some errors will upraise. But in general the idea is to use three branches and merge them into the master branch. In contrast, the upstream branch holds only the upstream version history, which is equal to the timeline of WIkipedia provided in the website.

The chart from the beginning will increase the confusion. What the user needs to know is that he has to copy the latest version of the Wikipedia article into the upstream branch, and commit the changes with “git commit”. It's also important to not delete the branches after merging, because the issue1 branch is the forked version which looks different from the upstream. If the user want's to edit the encyclopedia he is doing so only in the issue1 branch. The master branch is some kind of clearning branch in which the two other branches are combined.

January 15, 2020

POV forking of Wikiipedia

On the first look the git tool and the Wikipedia project are working the same, because they are supporting a version history. The difference is, that the Wikipedia project never was forked in his history, only local copies are created. A fork is technique used heavily at github to bypass the original community and start developing a new branch. The main feature of a fork is it's ability to integrate the updates of the upstream. That means, the fork contains the latest information plus extra content.

The subject overall is very complicated. So i have decided to make a simple experiment to test what will happen in the reality. For the first step, the fork is created only on the local harddrive but not in the Internet, and it's not the entire Wikipedia but only a few files. But it is well documented so that other users can reproduce the steps. It starts by creating a new git project in a working directory:

mkdir wiki-fork

git init

touch readme.txt

git add --all && git commit -m "initial commit"

Then the folder is populated with three files from the original wikipedia, With a copy&paste the latest markup-file is created in the directory. What we need also is a branch:

git add --all && git commit -m "create three files"

git branch issue1

The idea is, that the fork is maintained in the issue1 branch while the original project (upstream) stays in the master branch. The merge is done with the following command:

git checkout master

git merge issue1

git branch -d issue1 // delete branch

The idea is, that in the issue1 branch my own individualized Wikipedia version is available in which only i can edit, similar to a sandbox. The created edits are never send back into the Wikipedia but they are merged on the local harddrive into the master branch. The best visual understanding is a github project in which a fork is created. In theory, this allows to the developer to become independent from the original project.

The open question is, how does it work in reality. I have searched at Google for some information but didn't found something. So i have to test it out. The critical point is, that sometimes the upstream will update their content. That means, if the wikipedia community change one of the three files online, i have to update the content in the master too. The problem is, that the information in the issue1 branch are different from that so there is need to merge. It's unclear how often this is necessary. The hope is that a merge is needed only once a week, and that it can be done automatically. But in case of doubt it will result into a merge conflict and it's unclear how to solve it.

What we can say is, that the git tool is a here to stay. It's the most advanced forking / version control system available and was designed with the desired purpose in mind.

Alternatives to mediawiki

The mediawiki engine was programmed in the PHP language because of historical reasons. PHP is more advanced than outdated perl scripts but it has major performance problems. The question is which programming language fits more to modern needs? One idea is to use king of programming languages which is C++. C++ is the fastest language available which is supported by independent compilers. It can be used for creating web-applications and outperforms PHP easily. Another alternative is Python which is an easy to use beginner language.

Some wiki systems were written in the C++ language already but they are not used in reality. And python as a language is much slower than PHP. A possible third candidate is https://en.wikipedia.org/wiki/Wiki.js which is working with node.js in the background. In contrast to C++ the Javascript language is accepted widely for web-development. IT's only bottleneck is, that it doesn't provide object oriented features so it can't be used for building larger applications. But is this assumption correct from todays standpoint?

https://www.geeksforgeeks.org/prototype-in-javascript/ describes how to create prototypes in Javascript which are lighter object oriented templates. Recent versions of Javascript are equipped with full blown oop features including inheritance. So it make sense to take a closer look into the WIki-js rendering engine. The advantage is, that the GUI which is shown in the webbrowser and the backend application is written in the same programming language. This provides – in theory – more better user experience than the outdated mediawiki engine written in PHP.

An online demo for “wiki.js” isn't available. What goes into the direction is the visual editor of Wikipedia which is based on node.js. The normal mediawiki backend system was realized in PHP. It seems, that the developers are happy with this idea. One reason is, that PHP was designed as a backend language, while javascript is famous of it's ability to embedded textboxes and forms into the code.

How to fix the rm -f problem

Linux user are confronted with a serious problem. The operating system doesn't ask many questions but it will execute the rm -rf command. The problem is, that all the data are deleted and it's not possible to recover them. To prevent such a mishap it's a good idea to reconfigure the rm command. Perhaps it would make sense if all Linux distributions would do so by default. What the average user want's is to delete files only with the file-manager but not with the rm command.

gedit ~/.bashrc

alias rm='rm -i'

alias cp='cp -i'

alias mv='mv -i'

source ~/.bashrc

January 13, 2020

Hindi as a world language

A map of the Wikimedia foundation shows the readership of the famous encyclopedia for each country in the world, https://stats.wikimedia.org/wikimedia/animations/wivivi/wivivi.html The Hindi language has a wide distribution all over the world. It is used by India to retrieve information, but the language is also spoken in the US, China. The overall population which speaks hindi is 560 million according to the info box, but in reality it's much more. It's only a conservative estimation and the number is growing. Hindi has a good chance to become the worlds famous language. In india alone, over 50 million pageviews each month are generated by Hindi speaking users. That means, they are using the WIkipedia in their mother toque to retrieve and write information about their own country and the world in general. The overall population in India is 1300 million which means, that in the future the amount of pageviews in the Hindi language will grow.

January 10, 2020

The german wikipedia debates how to find new authors ...

Under the URL https://de.wikipedia.org/wiki/Wikipedia:Kurier the German version of the signpost informs the Wikipedia community and the public as well about the project status. The main topic in January 2020 is, that the amount of active contributors is to low, and the community argues about the reasons. The idea is to increase the amount of authors, but nobody knows how to do this exactly.

Even if my German language skills are great, it's hard to follow the debate. Not because of the vocabulary but because i have the opposite fear. The problem with Wikipedia is, that the amount of authors is too high and could explode in the future. What does that mean? On the first impression, the Wikipedia project is protected against chatbots, because a chat bot is not able to create a valuable edit. This is true for a complex article which contains lots of natural language and is equipped with extensive references at the end. The danger in Wikipedia is, that chatbots are utilized to generate a certain sort of Wikipedia content which is highly structure. This is called a stub. A stub is small article which contains of two sentences and can be generated from a RDF-ontology by a knowledge-to-text system.

The resulting stub will read like a normal article, except that it was generated by a computer program. In contrast to humans, it's very easy to make a copy of the computer program. A simple unix command like “cp chatbot.py chatbot2.py” is enough to create as much wikipedia authors are needed. The funny thing is, that according to the published papers at Google scholar, simple biography stubs are generated with bots already. That means, the amount of Wikipedia authors is higher than the german chapter is aware of it.

January 06, 2020

The pros and cons of Wikiprojects

An interesting meta-section in the Wikipedia encyclopedia is a Wikiproject. In contrast to the portals, most wikiprojects are available today. That means, it's a barrier to put a WIkiproject into the deletion discussion. On the other hand, the propability is high that this will happen in the future, because a WIkiproject has the same problem like a portal: low amount of users, and low benefit for the overall Wikipedia.

Instead of arguing pro WIkiproject deletion, it make sense to use the time to hear what the experienced users from Wikiproject have to explain why the project make sense. The interesting point is, that Wikiprojects are the key for reaching an audience in the university. The Wikiproject medicine for example was used sometimes by medical students in courses. It's a low entry option to become familiar with Wikipedia.

Let us investigate what medical students are interested in: they are not motivated to learn something about physics nor computer science, but they stay within their own subject. If somebody studies medicine he likes to read books about it, and that means, only books about medicine. So it's a natural choice to create a subpage in Wikipedia to coordinate a team of medical students who are interested in improving existing articles. That is basically spoken the idea behind Wikiproject and the reason why they were founded in the past.

Suppose the idea is to delete all wikiprojects, what is the future participations of medical students in Wikipedia? On the first one a deletion, would block all efforts to contribute to Wikipedia especially if the own domain is focussed on a single subject. Perhaps it make sense to go a step backward and describe what science in general is about. The idea behind science is to become a specialist on a single subject. The idea is to reduce the scope. The question is, where is the right place in the Wikipedia project for doing so?

The answer is very simple: Reducing the scope is done with keywords. A keyword like “Immune System” is more specialized than the general term “Medicine”. Contributing to the Wikipedia project is possible with article request, maintenance request, deletion request, peer review request, and photo request. That means, the user has to open the page and put in the term into that page. For example, the article request page contains of all the domains, like physics, literature and medicine. What the user is allowed to do is to put his specialized keyword into the section “medicine” in the article request page.

This sounds a bit complicated but it's equal to tag a request. That means, all the request are handled as request but they are tagged with domains like physics, medicine and literature. This kind of interaction provides the same feature like a wikiproject but is adressed to a broader audience. The advantage is, that all the users have to observe the article request page. That means, even non medical experts are allowed to enter new article requests.

The wikiproject concept is equal to a decentralized project coordination. The idea is to split the Wikipedi a into sub-sections which are domain specific. This produces a lot of inefficiency. The better is to centralize the requests and tag a request with it's domain. That means, if a user likes to improve the project, there is only one page available which is the request directory. And all the issues are put into this single page.

The effect is, that the page view is higher, more people will monitor this page and the productivity is better. Wikipedia is on a path towards this goal. Since mid 2019 many portals are deleted already and in the future, the wikiprojects will follow. The switch from decentralized requests for writing new articles into a centralized request page will make Wikipedia more professional.

Let us imagine how a medical student can contribute to Wikipedia. What the user has to do is to find a very complicated medical term which has no WIkipedia article right now. This complicated term is put into the article request page, because Wikipedia should be explored into that direction. The same user can write the article for the term, and if he is done, he deletes the term from the list. The trick is to put only terms on the list, which are specialized. That means, it is not directed towards a mainstream audience but the amount of papers about the term is very low. The resulting article will attract very few readers. At the same time the reputation for creating such an article is high. What researchers are doing is to aggregate knowledge about a complicated seldom used term from a specialized domain.

Let us take a look into the reality to determine if the proposed workflow make sense. https://en.wikipedia.org/wiki/Wikipedia:Requested_articles/Medicine is the request page for all medical terms. It has a pageview of 2 per day. Which is very low. It provides a links to other languages like English. And it has some keywords like “Marshall protocol” which is a specialized subject within immune system diagnosis. According to the changelog the page isn't edited very often.

In contrast, the Wikiproject Medicine has a dailypage view of 86. And in the talk section, lots of domain specific discussion is available. It seems, that today, the decentralized wikiproject medicine is more attractive to the users, than the centralized version. The problem with the Wikiproject medicine is, that even this specialized portal doesn't fulfill the needs of the users. So they have created many subprojects: Wikiproject Anatomy, Wikiproject Physiology and so on. THe result is some kind of WIkiproject spam, in which the amount of projects is growing, but the team behind it, is doing nothing. On the other hand, i think the Wikiproject idea is a good possibility to learn because it shows, what the users are interested in. In most cases the idea is to specialize on a single subject which is a good idea because this will bring science forward. Let us click on the item Wikiproject anatomy and observe what comes next. In the anatomy section a new subfolder waits for the user, it's called Category:Anatomy articles by topic, that means, the user can decide if likes to read articles about subsection of anatomy.

January 05, 2020

Wikipedia is restructuring it's portals

https://en.wikipedia.org/wiki/Wikipedia:Miscellany_for_deletion/Archived_debates/September_2019

Wikipedia contains of the mainpage, which has a large amount of traffic and from the main page the users are directed to subpages, called portals. There are portals about art, mathematics, physics and sports. Since around August 2019 there was a discussion started to delete all portals. Or to be more specific, all the 500 portals are discussed individual to keep them or delete them.

The reason for deletion is mostly the same. The page view of the portals is low, the last edit was made 5 years ago, and the interaction on the portal page is low. A similar concept to portals are wikiproject which are also subpages to coordinate the efforts about the same topic, for example about computer science. Some of the wikiprojects were deleted too, but it seems that the deletion energy is focussed first on portals.

The discussion can be described from a more abstract point of view. If the portals and wikiprojects are gone, where is the place to discuss about new articles, maintainance and peer review request? There is a place, called “request directory”. In that domain, all the domains (art, sport, science, film) are combined under a single page. That means, in the article request page, the different subjects are combined: mathematics has a subsection, literature has one and so forth. According to the pageviews, such maintenance pages are used very often by the users.

The deletion of the former portal pages is a major step in the development of Wikipedia. On the first look a portal make sense. It helps to combine different articles under a single page. The concept is comparable to a specialized library. That means, under the term “computing” only computer experts are discussing how to write content from that domain. So it's surprising that this concept has failed.

In the deletion debate, only the majority of users is pro deletion. In contrast to normal deletion debate the opposite opinion is very low. So the prediction is, that in 2020 all the other portals and perhaps the wikiprojects too gets deleted.

To understand how the meta section is working we have to take a look at the remaining Wikiproject computerscience, https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Computer_science

Right now, the Wikiproject wasn't deleted, so it's a good time to observe the idea behind it. SImilar to a portal, a Wikiproject aggregates the efforts for a single domain. It described how many high quality and low quality articles are available in the domain of computer science, and it has a to do list.

The to do list contains of sections for article request, cleanup, expand existing articles, infoboxes, photo request and stubs. There is a also a list of participants who have put their username into a list as an indicator that they are motivated to contribute to the wikiproject.

The perhaps most important part of the Wikiproject computerscience is the to do list which includes article request, cleanup and so on. The interesting point is, that every wikiproject has such a list. But the list is filtered by the domain. The more general idea is, to take a look at the Wikipedia wide general to do list, which contains the same categories and combines all the domains in a single page.

So we can say, that portals and wikiprojects are obsolete and will be replaced by the general to do list to maintain all the domains like computer science, sports, films and so on.

specialized library

How can it be, that in the year 2005 many hundred of portals were established in the WIkipedia and now the same community is motivated to delete all of them? It's about how to organize knowledge outside the Wikipedia project. Suppose, there is no Internet available and a classical library is used to create an article about a subject. In the university domain, a specialized library was the normal way in doing so. The advantage of a specialized library is, that it provides only a small number of books in a single building. This reduces the costs. It's possible to create a specialized library with printed books about a single topic for example computer science. The amount of journals, books and dissertations about this subject is small. It's possible to collect all of them and put them into the library.

This was the workmode before the Internet was invented. If somebody was interested in getting an overview about the topic or likes to create a new paper, he was going to a specialized library. This was without any doubt the motivation in 2005 to establish portal pages in the Wikipedia. It copies the well working principle from the offline world.

In 2019 the situation is different. Most information are stored online, and specialized printed libraries are under pressure. What the experts users in the university are doing to today is to visit a general library and use Internet for getting access to specialized papers. The same is true for Wikipedia users. Most of them are working with fulltext search engine to get the information they need. The most used entry page is not a specialized search engine for a certain domain, but a search engine works by entering the needed keyword. The same search engine allows to browse in different subjects. This makes a domain specific portal obsolete.

Expanding WIkipedia

In the self description, a portal provides an entry page for a domain, which is adressed to the readers, and a wikiproject is part of a portal to coordinate the effort of editors to improve the articles. The idea is, that it's not possible to coordinate the maintainance of all the 5 millions articles in the Wikipedia, so the task is split into domains likes art, history, science and music.

A large scale project is Wikiproject history, https://en.wikipedia.org/wiki/Wikipedia:WikiProject_History SImilar to other other wikiproject it looks a bit inactive. But we are ignoring the low traffic and take a look what the self-understanding of the project is. The project goals are to improve the history articles in the Wikipedia by creating new ones, expanding old ones and improve the quality if needed. Also the goal is to serve as a central discussion point and to answer queries from the reference desk.

The interesting fact is, which kind of topic is offtopic at the history wikiproject. Everything outside the domain of history. That means, if somebody likes to expend an article about robotics he won't get help in the history project. This is logical but it explains what a possible bottleneck is. But let us go back to the goals. Create new articles and improve the quality of existing one is an important task in Wikipedia. It's not possible to ignore this goal but this would be equal to a failure of the Wikipedia in general. So the question is how to do this task more efficient?

The best way in doing so by formulating requests from the environment. That means, a user tries to find an article about a subject, is disappointed because the article is missing and then he formulates a request like “i need an article about topic abc. Please create one or explain to me, why the topic isn't available in WIkipedia”. There are two options how to handle such requests. One option is to focus on the subject or to focus on the request in general. A wikiproject is focus on the subject. That means, if a user formulates a request about a missing article from the subject history he has to ask the history section, and if he likes to read something about computer science, he has to go to a different wikiproject.

The more efficient way for interaction is a centralized request page, in which all the domains are combined. For doing so, the existing portals and wikiprojects have to be deleted, while the request directory should be improved and become more user friendly. A centralized request desk allows to improve all the articles in the Wikipedia.