April 27, 2020

How to reduce the traffic in a blog to zero

The first thing to do is to avoid larger blogging websites like wordpress and medium and select Google Blogspot as the blogging website. Google Blogspot doesn't has commercial advertaisment which makes the space not attractive for a larger audience. Additionally, Google websearch doesn't indexes the blogspot subdomain so the newly created blog is nearly invisible in the internet. After creating some blog posts the user has to rename the blog. This can be realized with the settings menu. The result is, that even the blog was discovered by external search engines like bing, the URL becomes obsolete and the newly created URL has to discovered again. This will drop the daily pageview counter down to zero for sure.

The open question is, what to do with a blog which has no audience? This is outside the scope of this tutorial. Here, it was only explained how to create such a C- weblog. The only thing what is for sure is, that the no comments were written by external internet users, and that no search engine will find the blog.

First look into Debian 10

Debian 10 was released in mid 2019. Installing it on a virtual machine make sense, but the more interesting idea is to install it on a phyical notebook. I have done so and it works reasonable well. The most obstacle was the installation proecudure itself. After booting the USB stick, the user has the choice between a Gnome based installer, a graphical installer and a text install. I have choosen the graphical installer but it was a bit complicated to browse through the options. After a bit reading the manual an installation was possible, but especially newbies will find the installer difficult to use.

The more elaborated way is the modern gnome based installer. Which works better but needs more system ressources. In the next installation it is the better choice. After the system was installed on the PC the first bootup was using the wrong graphics settings. The user has to manual install the non-free driver which fits to the graphics card. Selecting the correct package has to be done manual. That means, the user has to know that the resolution problem can be solved with a non -free driver and then he has to read the wiki section to identify which driver needs to be installed. Similar to the installation software this step is a bit hard for newbies.

After mastering the step the system runs great. All the programs are available which means that firefox, Spreadhseet programs, python 3 and all the other open source software runs out of the box. The system requirements are on the same level like in Fedora and Arch LInux, which means that an idle PC will need around 2 GB of RAM and the Debian OS occupies around 15 GB on the harddrive. Compared to early Linux systems for example slackware the hardware requirements are high, but in comparison with Windows 10 it's a midsize system with moderate requirements.

What is important to know that before the user will accept the Debian philosophy he has to understand the advantages of a stable system over a rolling release distribution. If the user isn't familiar about the details of git branches and how a stable branch is monitored for security issues, he won't like the Debian philosophy very much. The reason is, that a short look into the version history of all the software will show, that Debian is outdated. Firefox is obsolete since 6 months, the Linux kernel is an older one and Python is not the current version.

The main advantage of Debian over other Linux distributions is, that all the Debian users have installed the same software. On 2020-04-27 it Debian 10.3 which means, that the user will need for exactly this version a handbook, security patches and updates. This makes it more likely that after installing an update the system will run without interruption. In contrast, the situation at Gentoo Linux and Fedora is, that every user has installed a slightly different Linux system which makes it hard to trace errors back. Therefor, Debian has much in common with Windows 10 in which all the users have installed the same version. This is important to blame the right opponent. That means, if the user has installed Debian 10.3 already, and something isn'T working it the fault of the Debian project but not of the single user.

April 25, 2020

Debian Release management

A short look into the Debian release schedule https://wiki.debian.org/DebianReleases provides helpful information for the endusers. In general, each version is maintained for 2 and a half year which means, the user has to install the Operating system once and can use it very long. If the 2.5 years are over the user can update to the next release with a simple command line. This brings the user in a very comfortable situation.

What is not answered in the release chart is how to program all the software. The Linux operating system contains of hundreds of programs and it's unclear how these programs are working together. From the users perspective this is a mnor problem, but it's up to the developer to program and test the code.

Perhaps this is the most obvious difference between Debian and other operating systems like Arch Linux. In Debian there are two conflicting social roles: normal users vs. developers. The social role of the user is installing the software and then it is using all the code without doing anything in return. While the social role of a programmer has to do with fixing issues, compiling sourcecode and monitor security issues.

The simple explanation is, that two conflicting roles are equal to a professional Linux distribution. If a linux distribution has only one role which is the developer, it can't be called a serious distribution. A wrong assumption is, that the conflicting roles are only available in closed source ecosystem in which the user has to pay 100 US$ for the operating system while the other side takes the money and programs the code. The surprising situation is, that the same social roles can be imited in the Open Source world as well. The idfference is, that Debian users aren't paying money. The result ist, that the quality of the software is lower. The latest Debian 10 software has a weaker quality than the latest Windows 10 version. In Windows 10 the graphics card is working better, the PC needs less energy and the installation works more smooth. These disadvantages have to be accepted by the Debian users because they get the .iso file for free.

The advantage of Debian over other LInux distributions like Gentoo, OpenSuse and Fedora is, that in Debian it's possible to become a normal end user. An end user is somebody who doesn't write sourcecode nor he fills out bug reports, but he is using the software. Very simlialr what most users are doing with Wikipedia. They type in the adress into the URL bar and read the content which was written by other.

Most Debian experts are arguing that their operating system has a greater stability than other linux distributions. What are they talking about? A naive assumption is, that stable means, that the software is secure or has no bugs. This is only partly true. The current Debian 10 stable has a lot of security issues, and some minor bugs too. What stable means more concrete is, that the development is done in a stable branch. That means, Debian contains of a unstable branch and a stable branch. The existence of a stable branch allows to publish longterm versions which are running 2.5 years on the computer. In contrast, Linux distribution which are not stable like Arch Linux or Fedora need to be updated once a week and in case of doubt, the system won't boot after the update.

The term stable is referencing to a publishing schedule in which every 2.5 years a new release is available and during this period the user is running the same version on the computer. The stable branch is a technique to provide this release schedule.

Some arguments for Arch Linux

Arch Linux plays a unique role under all Linux distribution. Because the Linux distribution can be explained very easily. The latest version of each software is compiled and installed on the PC of the user. The Arch Linux wiki and the pacman package manager are supporting this workflow very well. Most users understanding the idea behind Arch LInux so they using it at least for playing around.

From a more abstract perspective, Arch Linux is a developer friendly distribution. It supports the idea of agile software development. If a certain subsystem has a problem, a bug is created, the sourcecode is improved and with a delay of less than 24 hours the updated binary version can be downloaded from the server. No matter if the sourcecode of Firefox, Linux kernel, a texteditor or from a game was improved, the sourcecode gets compiled into binary versions and the user can download it from the server.

Unfurtunately, the Arch Linux project has some limits. It is used not very often in productive envirionments. In theory it's possible for doing so. That means, on a vserver an Arch Linux system can be installed and on the Laptop as well, but only few people are doing so. The exact reason is not defined clearly. Sometimes the explanation which is provided is, that sometimes the Arch Linux system won't boot after an update. But with the recent improvements of pacman this is seldom the case. In most cases, the boot process is working fine, and if not the manual intervention is minimal. Another explanation why Arch Linux isn't used in reality is because the concept is too new. That means, the concept of agile development and a rolling release doesn't fit to the well known waterfall software cycle so its hard to convince a larger audience in using the software in reality.

The more realistic description why Arch Linux isn't used for productive environment has to do with conflicting needs of developers and normal users. Arch Linux was developed from coders for coders. The project is located in the upstream and explains that the upstream is equal to the downstream. Everybody is a programmer and in exchange he gets the most secure software ever programmed. This story doesn't fit to the reality. First thing is, that most users are not interested in creating software, but they want to use it as a normal user. Arch Linux ignores the idea of software quality checks.

Let us describe the preconditions behind the Arch Linux workflow. The idea is, that the upstream never makes a mistake. If the Linux kernel was improved from version 1.0 to 1.1 this improvement make sense and there is no time to argue for the reason why. The problem is, that most software was written by amateurs and they are not programming the software for the normal users, but they are programming the code for other reasons. Especially in the Open Source ecosystem most software projects are started because the developer team likes to try out something. For example, somebody likes to learn how the C language is working and therefore he starts a gaming project in which the C language was used.

The average user assumes that the upstream has programmed malware, which is spying the data from the user. In contrast, the upstream assumes, that the normal user has no experience with computers at all and therefore he needs pre-defined settings. The consequence is, that no trust at all is available between upstream and downstream. This problem is ignored by Arch Linux. Arch Linux assumes, that no conflict between upstream and downstream is there.

Update over the Internet

Rolling release distributions like Arch Linux have become successful since the advent of fast internet connection. If the users are euipped with a stable internet connection, its possible to update the osftware every week. This narrative reduces the comparison between rolling release and stable release to a file transfer problem. The more elaborate comparison is focused on the development process. The bottleneck is located on the upstream level. Before a software can be installed somebody has to write the code. Software development is done with the git version control system in which a team of programmers are writing lines of code. The software development process has to be organized in a certain way. The management of writing code can be realized with rolling release and stable release.

Rollling release is equal to a single branch model which is trunk. It is the same principle used in a wikipedia article. There is only one current version of the wikipedia article and everybody is allowed to modify it. It is surprising to know, that in reality most software project doesn't work with a single branch model. Because software development is more complicated than creating a wikipedia article.

The first reason is, that the amount of commits is higher. The average wikipedia article contains of only 20 commits over a timespan of 1 year. While the average software project contains of thousands of commits. The second problem in software development that different tasks has to be solved in parallel. It's is possible to create new features, improve the security, update the documentation, and fix existing bugs. The best practice method in doing so is to use two branches or more.

The problem with two and more brnach models is, that no longer a current version is available. A current version means, that all the branches are merged into a single one, which is not the case. Instead, the average software project has many current versions at the same time:

• a current testing version

• a currrent security version

• a current stable version

• a current bugfix version

• and so on

The additional problem is, that these versions are improved independent from each other. This is the major advantage but also the major disadvantage of the git version control system. A rolling release software makes only sense if the development model is based on a single trunk branch.

Let us describe a common three branch software development model. If a one man project or a small team is starting a new project at github they will create three branches: stable-branch, issue-branch, testing-branch. If the developer likes to fix an issue from the bugtracker he will submit into the issue-branch, if the maintainer of the project likes to aggregate different bugfixes into the testing version he will merge the issue-branch into the testing branch and if a new stable version should be created, the testing branch is copied into the stable branch.

This three branch model is some sort of best practice method in software development. The surprising information is, that it's not a rolling release version. Instead the new versions in the stable branch are produced with a time lag. That means in january 2019 the bugfix was created, in March 2019 the testing branch was updated, and in June 2019 the new stable branch version was created. In this example, it took 6 months until the bugfix was available in the stable version. This timelag can't be reduced. The reason is, that the amount of ressources in a project are limited. For example, if the github project was created by 2 programmers, the maximum amount of written codelines per day is only 10x2=20 lines per code.

Let us make a small example. Suppose, the team likes to improve the software with 3000 additional lines of code. According to the math, they will need 3000/20=150 days for the task. If they are starting today, they are finished in 6 months. This delay produces the time lag in the release workflow. The only way to reduce the time between the occurence of a bug and until it was fixed in the stable version is to increase the amount of programmers. If the team has access to 200 programmers, they can reduce the timelag drastically.

Freezing the upstream

In the first example, a rolling release software project is described. It contains of a trunk branch which is updated once a day. The normal user is asked to install always the latest version, because it contains all the improvements and security fixes.

In the second example, a stable release is described. It is created by freezing the trunk branch. That means, on a certain time in the past, a copy of the sourcecode is created in a different folder. And then the copy gets improved to fulfill the needs of the normal user. Freezing the upstream is done as an addition to a normal upstream development. At the same time, the upstream trunk branch gets improved without interruption. That means, the stable team is able to create the freeze independent from the upstream developers.

It depends on the concrete software project how complicate it is to freeze the trunk branch. In most cases, the point release is created together with a handbook, security updates and bug reports against the stable version. The only thing what is sure is, that an additional stable branch needs more effort than only improve the trunk branch. A trunk branch has to do with the software project itself. Which is focused on the sourcecode and the improvements. While a stable branch has to do with the needs of the normal users.

Wikipedia edits made easy

Creating Wikipedia edits is a very complex task. Lots of papers were written about the topic in the past. The good news is, that Wikipedia edits can be formalized and repeated over and over again. The common Wikipedia edit contains of two steps: creating keypoints and writing prose text.

Unfortunately, most long term Wikipedia authors are combining these steps into a single edit. They submitting the changes to an article without further comments. This behavior makes it harder for the newbies to create edits by their own. The better idea is to assume, that the newbie has no experience with WIkipedia at all and likes to contribute to the project in a predictable way. That means, without producing conflicts and without getting banned because of false edits.

An easy to follow edit strategy contains of the described two step pipeline. In step 1 the newbie is posting keypoints for an article to the discussion page and in step 2 he converts these keypoints into full sentences. The good news is, that this strategy is described under the term “creating powerpoint presentation” Since decades. A powerpoint presentation contains of the same steps. In step 1 the author prepares the presentation at home and writes down the keypoints to the slides. In step 2 he helds the presentation which is equal to convert the self created keypoints into natural speech which contains of full sentences. The second step is done in front of the audience. That means, the lecturer not only reads the keypoints loud, but he is using the keypoints to talk about the subject.

Its interesting to know that the two step pipeline for creating powerpoint presentation is the international defacto standard. All the presentations in the world contains of written keypoints stored in the slides plus the oral presentation in which the speaker formulates full sentences. Its not possible to make a presentation in a different way.

Its a bit surprising that in the Wikipedia ecosystem this two step workflow is not known. In the official help section the steps are not mentioned. Instead the official tutorials are assuming that an edit is the smallest item which can't be divided into sub tasks. This assumption is wrong. A wikipedia edit is equal to submit prose text to Wikipedia which is annotated with bibliographic references. Before this prose text can be added to an article, the author needs a preparation step. He has to read through the existing information and he has to make some keypoints what he has read in the papers.

Most authors are storing this prestep either on their local harddrive or they are trained well enough to not need such a step. For newbies the recommendation is, to submit the created keypoints to the talk page, because this helps to get a better overview. Newbies are allowed to make mistakes, which can be located in two steps. Either the newbie struggles in making notes by reading existing information. Or the newbie isn't able to formulate the self-created keypoints into prose text. Getting feedback at which step exactly the error was introduced will help a lot.

That means, it is not enough to judge that a certain edit is wrong. But the more elaborated question is, if the creation of keypoints was a problem or the transfer of keypoints into prose text.

Well written articles

Let us analyze the existing articles in Wikipedia. What they have in common is, that they are written for the enduser. They are formulated in prose text and they are equipped with bibliographic references. A wikipedia article and a recorded powerpoint presentation have much in common. They can be read/listen from start to end and in most cases the text makes sense.

What is not given by the average Wikipedia article are the presteps until the article was created. A naive assumption is, that an article is created by smaller edits. But this definition hides the fact, that the individual authors are using their local harddrive to prepare the edits. The prepation steps on the local harddrive is never uploaded to Wikipedia, therefor its much harder for the newbies to reproduce the steps for creating articles by their own.

The interesting point is, that in the normal tutorials about creating academic text the prestep of notetaking is described in detail. Nearly 100% of the manuals in which the process of creating academic presentations and academic papers is described, the user is asked to create first the keypoints and then formulate the prose text. It's not very difficult to transfer this tutorial for creating Wikipedia articles. The reason why this is not made in existing Wikipedia tutorials is, because the average long term Wikipedia author is already familiar with academic note taking. For the Wikipedia expert there is no need to talk about creating notes, because this step is assumed. This untold assumption makes it harder for newbies to do the same what Wikipedia experts are doing. What the newbies are doing is not making notes, but they think its possible to create on the fly edits.

Let us describe the imaginary on-the-fly edit in detail. On the fly means, that an edit can't be divided into substeps. Somebody reads a fact in a book, and adds this fact to Wikipedia by submitting an edit. This workflow is described in existing Wikipedia edits. The problem is, that in the reality it won't work. Especially not for newbies. The reason is, that the subjects are too complicated, the quality standards in Wikipedia are too high and the newbie isn't familiar with academic writing. The result is that an on-the-fly edit will become a reason why the newbie get banned.

Two step edit pipeline

The recommended edit workflow is much easier to master. Creating keypoints from existing academic papers is not very complicated. The user has to write down important facts and he makes a note from which paper the information comes from. Training this behavior is not very advanced. The second step in the overall pipeline is also easy to master. Taking existing keypoints and convert them into prose text is something which has to do with academic writing. The facts and the literature are given in advance and what the user has to do is formulate the facts in an easy to read paragraph.

The interesting point is, that after combining both steps the result is a high quality Wikipedia edit, which is accepted by the admins. A win win situation means, that such an edit make sense for the newbie and the admins at the same time. The newbie is proud because he has edited in the Wikipedia, while the admin is happy, because existing articles were updated.

A good starting exercise for Wikipedia newbies is to focus only on making notes. The task is, to take 10 existing articles and post keypoints to the talk page. That means, the newbie isn't creating real edits, but he is posting only keypoints to the talk page. The most interesting effect is, that none of these keypoints gets deleted from the talk page. Because the talk page is the perfect place for storing the preparation notes.

April 24, 2020

Understanding the concept of Longterm releases

In the Open source software development model there are long term stable versions available. For example Firefox ESR, Debian stable, Ubuntu LTS and Linux kernel LTS. What most users doesn't know is what the difference is to the normal version. In most cases, the long term version is older which means, it is not the latest update but its outdated. If the aim is to update the system to the latest sourcecode, LTS versions doesn't make much sense. On the other hand, Firefox ESR and Linux LTS are available as an option so why are LTS versions are available if they are a poor choice?

To understand the situation we have to take a look into closed source software development. All the commercial software is published in long term version. The Windows XP Operating system is available only in the LTS version, the same is true for Windows 10, Mac OS X and so on. The most obvious difference between a rolling release and a long term release version is, that in the LTS version some components are missing. That means, the Firefox ESR browser is a feature reduced version of the latest Firefox browser. If the normal Firefox software has a menu with 10 menupoints, the ESR version is equipped only with 8.

The decision pro or against a LTS version has to do with the missing features. Its the difference between 8 menupoints and 10 menupoints. From the development perspective the interesting question is, why are the missing 2 menupoints not available in the LTS version? It has to do with stability. Improving a software with additional features needs a lot of work. If the manpower in terms of programmers is small, it will take some time until the missing 2 menupoints are added.

All long term versions have in common that with a time delay all the missing features are added. If the endusers waits a bit, the imaginary Firefox ESR version will have 10 menupoints too. The only question is how long does it take. In most projects, the time gap is 1-2 years. That means, the developers have programmed all the 10 menupoints already, but they are only available in the trunk branch but not in the production version. If a software contains of a long term version plus a trunk version at the same time it's a positive situation. It means, that the development team is improving the software and another team is testing the changes for stability and security. On the other hand, if a software project contains only of a trunk branch and doesn't offer a LTS version it's a toy project. That means, the software is programmed just for fun but isn't used in reality.

Let us analyze this hypothesis in detail. Suppose a development team is trying to improve the security, the stability and the amount of features in a software. The only way in doing so is by creating additional branches. One feature branch, one security branch, one stable branch and so on. After these branches are created a time lag is the result. That means, the security team will analyze version 1 of the software, but the feature team is implementing version 2 of the software. The security team is testing something different from the development team.

Let us imagine, the software project contains of a single branch which is called trunk. It is not possible to establiish a quality control on a single branch development model. The reason is, that all the latest updates are commited into the trunk branch and it makes no sense to ask if a certain version is secure. Because the version is changing twice a day. The result is, that a trunk-only repository isn't tested for stability nor security. Instead it's the development version which isn't recommneded for productive usage.

Let us investigate why some software projects have no long term version. The reason is, that the software developers want to minimize their effort. Its a one man team or a two man team which is programming the sourcecode. No quality control is available. Such a software development cycle isn't used for commercial software. Because most customer are paying money for the software and in return they are expecting a high quality. Only open source developers are bold enough to publish the trunk version of the software.

Point releases
Software development consists of two conflicting roles. The end user of a software likes to install the program on the PC, needs a documentation, and is interested in a bug free system. The needs of the software developers are the opposite. A software developer likes to modify the code twice a day, he is testing out new modules and he doesn't like to waste his time in writing a documentation.

Two conflicting roles means, that if the end user is happy, the developer isn't and vice versa. It's only possible that one social role in the game wins. The reason why long term versions were invented is to fulfill both needs at the same time. Let us first describe the end users perspective. The end users is downloading the LTS version of the software. This version is bug free, is well documented and runs out of the box.

From the developer perspective the LTS version is useless. The developer has no need for a documenttation because he has written the sourcecode himself, and he has no need that the software can be installed easily, because he compiles the sourcecode from scratch. What the developer is prefering is a trunk branch in which new commits can be applied. Because of the different needs, there are two versions of the same software available. The LTS version which is adressed to the enduser, and the trunk version which is the playground for the developers. Between both versions, there is a time lag. That means, they are out of sync. This allows that both social roles are happy at the same time.

Switching to Debian isn't easy

After experimenting with the Debian 10 operating system in a qemu enviornment, i have decided to install the software on a physcial machine. Unfurtunately, the installation menu was a bit complicated. First thing to mentioned was, that the touchpad wasn't recognized, so all the settings has to be made with the keyboard only.

Second problem was that after the first boot up the display resolution was wrong. Only a vesa mode was shown which was below the normal resolution. After experimenting with different grub settings (none of them are working) the answer was hidden in the debian wiki https://wiki.debian.org/AtiHowTo What the user has to do is to install a non-free AMD graphics driver. After the next bootup the normal display resolution is shown.

It's unclear how other Linux distribution are handling the situation. Perhaps they are installing in the background the non-free package and doesn't ask back. So nice, the resolution is now improved.

One thing which isn't working yet in debian is the edge scroll of the touchpad. It seems that with the new wayland display manager the settings aren't recognized. Perhaps it's possible to find a workaround or in the worst case the touchpad is working below their optimal quality.

Let us describe a unique feature of debian which has to do with installing outdated software. The standard webbrowser in Debian 10 is Firefox 68.7 ESR. In contrast to the normal Firefox software, this version was released in 2019 and then it was improved slightly. So what is the difference? The interesting situation is, that for most users the ESR version makes more sense. The story told in the version history looks predicatable. That means, a year ago the software was programmed, and then it was improved by security updates.

Now it is possible to compare this story with the trunk branch of Firefox and Chrome. In the trunk branch the story is, that the user has to check twice a week for an update, and if an improvement is available he has to install the latest version of Chrome within 24 hours, otherwise the system becomes vulnarable to attacks. Or let me tell the story a bit different. Suppose a user has installed the latest Chrome browser and hasn't updated the software since a week. From the perspective of the Chrome development team, the user has made something wrong. He was adviced to check for updates twice a week, he wasn't doing so, and as a result the user has made mistake.

Rolling release webbrowser are blaming the user if the system becomes vulnarable. In contrast, Longterm versions like Firefox ESR are blaming the upstream. That means, if something with the Firefox 68 ESR is wrong it's up to mozilla to fix the issue. And if Mozilla isn't able to fix the problem, the next question is why does a certain compoenent was introduced in the ESR version which needs so frequenetly an update?

Touchpad in XFCE4

After playing around with a different display manager the problem with the edge scrolling has been solved. In XFCE4 the touchpad can be configured differently than in gnome. Which allows to even scroll the content on the screen with the touchpad itself.

April 18, 2020

Creating a peer reviewed academic journal from scratch

There are some peer reviewed academic journal available. They have in common, that the published information has a high quality and most of them have a long tradition in the classical university ecosystem. Since the advent of Open Access there is need to start new academic journals. The open question is how to combine the Open Access philosophy with a peer review pipeline.

The technical side of an Open Access journal is very easy. In most cases, it's enough to upload a pdf document to a webserver and the paper can be read by a worldwide audience. We can discuss about the details, that means which software is producing the pdf format and which sort of webspace is the right one for hosting a journal, but in general this kind of pipeline will result into a high quality journal. That means, the document can be displayed on any device, and the webserver will deliver the information to any reader in the world within seconds.

The more advanced and seldom discussed issue is how to create a peer reviewed journal. A normal Open Access journal doesn't has a peer review, but it's some sort of pdf hosting website. That means, the admin of the journal uploads the pdf file, but the paper was never read by someone before the publication. In the classical academic publication system there is some sort of prepublication peer review available which allows to increase the quality, but its unclear how to reproduce this workflow in an open Access journal.

The current situation is, that some journals are experimenting with overlay journals, open peer review system and community driven peer review. One option is that somebody is only allowed to upload a new paper if he has peer reviewed an existing one. Another option is, to ignore peer review at all and allow the normal reader to comment newly published information. This will result into some sort of arxiv website which is extended with a comment section.

A truely peer review system is working a bit different. To framework for explaining the details is located in the software industry. The git version control system has a build in peer review feature. This feature can be activated with a dual branch workflow.

But let us go a step backward. Software development with git works usually with a single branch model. In the trunk branch the changes are submitted to a remote server. A single branch workflow doesn't has a peer review. A peer review has to do with creating two branches which are out of sync. A stable branch and an unstrable branch are needed as the minimum requirement.

Peer review and merging two out of sync branches is the same. The amazing feature of merging two branches is, that it will produces a conflict in any case. This kind of conflict creates a need for the stakeholders to negotiate about the issue. This negotiation is equal to a peer review. It is very different to comment a paper from the readers perspective, because a branch merge is done in the pre-publication step.

Now it make sense to transfer this philosophy to an Open Access journal. A minimal peer reviewed open access journal contains of two sections: unstable upstream and stable downstream. In the upstream section the incoming papers are stored, very similar to the arxiv repository. In the stable downstream section, the next issue of the journal is created. The interesting point is, that the stable section doesn't referencing to the upstream section but a complete copy is created. It's the same principle like in a git version control system. The stable branch and the unstable branch can be edited independent from each other. That means, the paper in the upstream section can be modified without affecting the paper in stable section.

Peer review can be realized with a dual branch model which is out of sync. To sync the branches a negotiation is required. Negotiation means to discuss the next issue of the journal with colleagues. An interesting side effect is, that the social roles in each branches are different from each other. That means, an author is allowed to upload a paper to the upstream section, but this doesn't mean, that this paper gets published in the downstream section.

Let us create a single example. The author uploads paper1 to the upstream section of the journal. The journal editor reads the paper and comes to the conclusion that the quality is too low. He decides not to publish this paper in the next issue. It is available in the upstream section, but it doesn't get copied into the downstream section. This produces a communication conflict, because the journal editor sends a rejection notice to the original author. This sort of communication is typical for all peer reviewed journal. What is available is a conflict between different social roles of the journal. These conflicting roles are attractive for normal readers because it makes the publication system more robust against wrong information.

From a technical point of view, there are many options how to realize a dual branch system. One option is to use the github project for hosting an academic journal. A more easier to realize system is to ignore the git tool at all and store the branches in different sections of a wiki. That means, the upstream branch is section 1 and the downstream branch is section2. This allows to create a peer reviewed academic journal on a single wiki page.

Peer review

Let us describe in the context of the branch model, what peer review is about. If the Open Access journal was created with the described workflow it contains of two sections: unstable upstream and stable downstream. The interesting point is, that this outline doesn't solve problems, but it will create many new tasks. One of them is the question which of the papers should be copied into the downstream section. That means, from the perspective of the journal editor the situation is, that some papers are available in the upstream section, but it's not clear which of them will fit into the next issue of the journal.

There are more than a single option to adress this question. A naive attempt is to use a dice and decides with a random generator which of the paper fulfill the quality standards of the journal. A second more elaborated decision making strategy is, if the journal editor decides by himself which of the upstream papers is well suited. And the best practice method is, that the journal editors delegates this question to a group of peer reviewers.

Because this point is equal to peer review it make sense to describe the process in detail. The starting point is, that the journal has two sections (upstream and downstream). To copy a paper into the downstream section a decision is needed about the quality. This decision is delegated to a group of people. What the group can do in response is to peer review the paper or not. In the worst case, the journal editor doesn't find an external peer reviewer. So he has to decide by himself if the paper fulfills the need of the readers.

But even in this case, its a peer reviewed journal. Because there was a decision which was taken. The decision if a paper fulfills the standards or not is only needed in a two branch model. In a normal repository there is no need to judge about a paper.

In the software industry, the principle has a long history. In the git tool a so called branch can be created easily. Creating a branch means to copy a folder and then it's possible to edit the folder without altering the original folder. Sometimes it's called a fork. Because the same sourcecode is available at two places at the same time. The interesting point is, that after creating a branch both branches will get out of sync. That means, a user can edit branch1 and branch2 isn't affected. This principle is a very powerful one and allows to divide software development tasks into subproblems.

Branches are used in the Open Source world for many things. There are feature branches available to fix a problem, and there are stable branches available to update complete operating systems. What comes very close to a peer reviewed academic journal is a stable release linux distribution like Debian. It's the same principle. Debian is peer reviewed software, that means, the Debian ISO file is different from the debian upstream branch.

Freezing the upstream

Every open source project starts with an upstream branch. The upstream is a repository which stores the sourcecode on a server. In most cases, the upstream is equal to a github folder, but the upstream can also be located on a SVN server or a FTP server. The upstream repository allows the creator of the software to update the project. He can upload new files and alter existing one. In case of content the upstream is equal to a wordpress blog. It's a place in the internet in which information is stored.

The interesting point in open source projects is, that apart from the upstream repository a second action is needed, which is called freezing. Freezing means to convert the sourcecode in the upstream into a release which can be delivered to the normal user. The interesting point behind freezing is, that from the programmers perspective this step has a low priority. What the software authors is trying to do is to improve the software with new updates. He isn't interested in stopping this update cycles. The only one who needs a freezed stable release is the end user.

In the Linux ecosystem there is a long duration discussion available if the normal user has a need for a freeze version or if he can use a rolling release version. Rolling release means, that no freeze is available but the normal user installs the same version as provided by the upstream. The interesting point is, that rolling release was never a success for real projects. All the major software systems like Debian, Windows 10, Red Hat, Apple Mac OS and Android are delivered in a release version which is froozen. So called nightly build versions are only available as an alternative. But they are not installed on productive systems.

The reason why it make sense to analyze the Open Source development model is because the concept of freezing the upstream is available since many years and it's discussed in the literature. Its the best practice method in open source software development. The same concept can be adapted to scholarly paper writing. Freezing a paper is equal for creating a peer review. The shared principle is, that the original author of a paper isn't interested in freezing a paper. Because this is equal to loose the control over the content.

In the reality a peer review is something which is working against a paper. A peer review is desired by the readers. A peer reviewed journal communicates between authors and readers as an intermediate.

April 15, 2020

Building a modern robot from scratch

The main reason why Artificial Intelligence has failed in the past in real robotics projects is because its focused on computer science but not on the underlying domain. The untold assumption is, that np hard problems have to be solved with a certain algorithm which should be implemented in a programming language. After executing the program it will solve a certain AI problem, for example to grasp an object with a dexterous hand.

Why this assumption won't result into a grasping robot is because nobody knows how the algorithm can solve a task. In contrast to sorting an array, so called AI tasks have nothing to do with computing itself, but they have to do with driving a car, the shape of objects and communicating in natural language.

The better idea for realizing AI systems is to start with a teleoperated robot which is extended later with a database of trajectories. In the first step the human operator controls the robot arm with a joystick. This allows him to grasp an object. In the step 2 the pipeline is extend with grounded natural language and a learning from demonstration motion database. Both modules are not located in classical computer science nor mathematics but they have to do with applications of Artificial Intelligence.

Perhaps it make sense to go into the details. Suppose the human operator is able to grasp an object with a joystick. In theory, he can do so many hundred times, but the goal is to transfer the task into software for higher productivity. One important step towards this direction is to repeat the same action and record the trajectory. The result is a motion capture database. If the scene in front of the robot fits to the recorded scene the recorded action is reproduced in the playback mode. An interpolation between different trajectories will increase the accuracy.

The next step towards advanced robotics is to tag the trajectory database with grounded language. That means, the database is annotated with labels like “open gripper”, “close gripper” and “push object”. This allows to search in the database easier. For example, if the next task is about “push object”, an SQL query to the motion database will return all the trajectories from this domain. Then the solver will select some of them and creates the interpolated trajectory which is executed on the robot.

The combination of teleoperated robotics, learning from demonstration, and natural language grounding is a powerful technique to realize robotics projects which can be used in reality. That means, the system is not only an academic project to teach the students who they should do something, but the robot can be used for solving practical tasks.

The reason why this approach is nearly unknown in mainstream robotics and AI has to do because its easy and very complex at the same time. The described method combines artifacts from different domains. It has to do with motion capture (which is used in movie production), with grounded language (which is used in natural language processing) and with spline interpolation which is located in regression analysis. Combining all these subjects into a single project is not common in normal computer science. What computer scientists in the past have done is to solve a single problem. For example, they want to search in a database for a value. This limited problem is analyzed in depth, and the algorithm is created in a high level programming language. Unfortunately, this problem solving strategy fails in AI domains.

A good starting point for all sort of AI applications are teleoperated robots. Teleoperation means, that the machine has human level capabilities as default. The idea is, that a human operator is in charge of the system all the time. He is not allowed to leave the joystick because then the robot will fail to solve a task. If this teleoperated paradigm is working, the next step is to think about how to reduce the workload of the operator. That means, that he can control the robot hand more easier and relax a bit.

Trajectory replay

The interesting effect of a trajectory replay is, that on the first trial it won't work. If the robot repeats the prerecorded trajectory in a new situation the robot isn't able to reach the goal. But this failure doesn't show, that the idea is wrong, but it shows, that trajectory replay isn't the answer to the problem, but its the problem itself. The question is how to program a trajectory replay system which can adapt to different situations? Learning from demonstration is some sort of challenge which has to be addressed with modern algorithms.

What is the current status in commercial robotics?`

Artificial Intelligence is from a technical point of view a complex domain. There are lots of papers with theories about neural networks and all sorts of robot control systems available. Even experts are not sure, which of them is a well written paper and which is providing boring information. A better idea is to judge about the status of robotics by take a look into commercial available products. Some brandnames are available like Agility Robotics, Boston Dynamics, Waymo, Moley robotics. They have in common that more than a single youtube video are available in which the engineers are demonstrating what is possible today.

The latest series of Agility robotics is able to walk on two legs, can climb stairs and is able to hold a box in the hands. The Waymo car is able to drive on the street alone, while the latest robot from Moley is able to cook a meal. All of the videos have someting in common. They are very new which means, that the videos are uploaded in the last 2 years and all of them showing robotics not available before. That means, the technology has made a big progress and it seems that all the difficulties are solved.

What will happen, if only the robotics from the mentioned companies are used in reality? Lots of human work can be done with these machines. There is no need for human truck drivers, for human cooking chef and for a human postal service. The main problem with this development is, that it is much faster than even experts have assumed. And what will happen in 2 years or even 4 years from now? Under the assumption that the trend is valid, the robots get improved and the result is, that the robots in reality have more skills than the robots from movies.

The interesting situation is that the current society isn't prepared for this kind of technology. What is expected by science fiction authors is, that in around 30 years from now some small progress is made towards fully autonomous production. But if the revolution is available within 2 years it will become too fast for the world. The only hope for critiques of technology is, that the videos of Boston Dynamics and Agility robotics are fake. That means, that the robots are performing great in a staged scenario but struggle on real tasks. This will make it impossible to use robots in real life condition.

Analyzing if robots are useful in practical application can be done by monitoring two parameters. First, the price of goods and second how important human work is for a society. If a robot revolution is there, the price for goods will drop to zero and human work becomes a low priority, because all the work is done by robots which are working for lower cost. If the engineers are struggle to introduce robots in the real world, the price for goods remains constant and human labor can't replaced by machines.

What is a fully automated factory?

There is a myth available about so called perpetual motion machine. That is a wonder machine, which will work without interruption after pressing the on button. Most engineers have come to the conclusion, that such a machine would violate the physical laws, or to be more specific the law of thermodynamics. What the engineers are not aware is, that perpetual motion machines are not invented as real machines, but they are the subject of stories about automation.

The concept is about a technology which doesn't need human work but works without human labor. The question is not how to build such a machine with the law of physics, but such a machine has to fit into economic context. To be more specific, a perpetual motion machine is something which the owner of a factory likes to buy to reduce the costs. What the owner of a factory is trying to achieve is to produce a maximum output with a minimum amount of work. This will increase his profit.

What is available in the reality are example of factory automation. The typical machine needs a lot of energy and is repeating the same task over and over again. A printing machine is an example, but a pizza making assembly line is also a good example. From a physical standpoint these machines are the opposite of perpetual motion machine, but from an economic standpoint they are. What automated factories have in common is, that the costs for the factory owner is low, and at the same time the factory is producing lots of pizzas. Each of them costs nearly nothing. It's surprising at which low price it's possible to produce goods if all the steps are fully automated.

Suppose a fully automated pizza line was installed in a factory, and the raw materials are available. From a users perspective such a device is a miracle. The user can enter who many pizzas he like to eat, and after entering the number of 100k the start button is pressed. The machine won't stop until all the units are created. No further interaction is needed. The most interesting point is, that such a workflow is available in the reality. That means, real pizza making factory can be visited and they are used to produced food for the population.

To understand fully autonomy machines better we have to ask for potential bottleneck. What all these devices have in common that they need something as input. Its electricity plus raw materials. If no energy is available and no cheese is there, the machine won't work. The interesting point is, that these input materials are endless. Producing energy at low costs is an easy task and producing tones of cheese is also a solved task. If the production of raw materials is combined with fully autonomous assembly lines the result is a fully automated economy. Literately spoken, such a system can produce endless amount of goods for zero costs.

April 12, 2020

OpenRA servers hit new record in numbers of players

The OpenRA game is available since many years. Its an open source clone of the famous Command&Conquer series. What makes the software interesting is, that it will run under Windows and LInux as well. In contrast to other Real time strategy game, no costs are charged for the users. Since two weeks the amount of players on the server has hit new records. In the past, it was difficult to find enough player to fill a map, but since a while this is not a problem.

Today there are more than 300 players at the same time on the servers, and new games are starting all the time. Technically this was possible in the past as well, what was missing in the year 2019 were a larger amount of players. It seems, that some newbies have discovered the game and are participating in the matches. On the website there is a statistics available https://www.openra.net/players/ which shows the increase of active players since April 2020.

Perhaps it make sense to introduce the game itself. What the users has to do is to manage a large amount of units at the same time. In contrast to a simple jump'n'run game there is not only a single character on the screen, but the user is in charge of 50 and more units at the same time. It's some sort of chess, but much faster. In the typical match 4 vs. 4 players, the situatiion will become chaotic very soon. That means, each player is in control of 50 and more units and the map is showing hundred of sprites at the same time who are doing something or not.

The OpenRA game has much in common with World of Warcraft. The difference is, that the graphics is only 2d and no background story is told. The result is, that the game engine itself is very little. The .exe file needs only 18 MB on the harddrive, which includes all the graphics, the multiplayer mode and even a replay mode to analyze games played in the past.

April 08, 2020

From a burnout society to an open society

In the sociology literature it was observed that modern societies are effected by the burnout problem. If a burnout doesn't effect a single individual but a larger group, it's called a burnout epidemic. That means, that the situation is out of control and the question is how to handle the stress level of the group.

A naive assumption is, that the problem of stress, burnout and burnout epidemic will disappear without external intervention and that in the future, the tasks for the individual will become easier to solve, but not more complicated. A look into the reality shows, that since the advent of the Internet and a demanding complexity in the macro economy, the problems will become bigger but never smaller. That means, that future societies will be effected more by the stress problem, but not less.

The good news is, that an answer to the situation is available. It was first introduced for the software industry but can be adapted to other domains as well. The answer is to transfer existing processes into Open processes. Instead of creating proprietary software the idea is write open source software. Instead of creating paywall protected academic papers the idea is to publish a paper as Open Access. And in contrast to manage a society as closed society the better idea is to establish a open society culture.

But what does the term Open means in reality? At foremost it has to do with a different role model between consumer and producer of a good. Open Source software is from an economic standpoint an example for a consumer first ideology. The consumer which is the end user of the software, gets the latest security updates, the most advanced software and he doesn't has to pay anything for it. All the Debian users who have installed the software on their PC never pay something in return. They get only the latest linux kernel, the LaTeX tool and the powerful gimp graphics program and they have to provide nothing in exchange. The same case is true for the OpenAccess ecosystem, in which the enduser can read as many high quality pdf papers in Google Scholar without paying a single cent.

On the other hand, somebody has to produce all these good. The OpenSource software has to be written and a pdf paper too. The interesting point is, that this problem is up to the single producer and isn't managed by a company.

Today, software industry and academic content creation are the only domains in which the term Open was introduced. In all the other domains of economy, for example in logistics, retail industry and in the medical sector the paradigm is focussed on the classical closed economy model. Closed economy means, that that the consumer of a good has to pay the price, and the producer of the good gets a monthly salary. There is a reason why open source is available since the 1980s, while Open logistics not. Because software can be distributed over the Internet, but logistics services not. With so called telerobotics this can be changed. Suppose there is a truck which can be controlled from remote. There is no need that the driver is located physically in the truck, but everybody who has access to the internet is able to control the truck. Under such a contraints, it's possible to manage the task of cargo transportation as a game. Very similar to writing software it can be handled with the open paradigm. From the consumers perspective the situation is pleasant. Suppose a consumer has a need that the load is transportated from a to b. Similar to all Open services, he never pays a price for the task, but he asks if somebody is able to do it for free.

This sounds a bit uncommon, so let us go a step backward into the domain of the software industry. The current situation for endusers who are familiar with Linux is, that they are searching for a piece of software. For example a file manager. And the additional criteria is, that this software needs a GPL license. All the proprietary file manager are ignored by the consumer. He defines the GNU license as mandatory.

Now its easier to imagine what future consumers of logistics services will do. They are defining as a criteria, that the transportation needs to be handled without any costs. It's up to the opponent how he can reduce his costs downto zero. If somebody isn't able to do so, he doesn't get the task.

Today's economy isn't powerful enough for the Open society ideology. If somebody is asking for a truck for free, he won't find a single example. But with advanced technology and especially with remote control of machines it's possible to realize this ideology in the future. A possible toy example is a cheap drone which is controlled over the internet, and the advanced feature is, that the consumer doesn't has to pay for the service, but it's financed with advertisement or something else. Similar how Open Source and Open Access is financed.

Open society means to extend the term Open Source, Open Access and Open Science to the society in general. That means, that all services which are offered are provided without costs to anybody. This sounds a bit like socialism, but it's the opposite. It can be described as an advanced from of capitalism in which the stress level is increased.

Language patterns in crisis communication

In role playing games, video games and online forums there is a special sort of communication pattern available. It's a conflicting language which is escalated by individuals. The interesting point is, that conflicts in crisis communication are the gold standard for mastering the game. And preventing such a language style won't work in solving real problems.

Let us assume what will happen, if Stackoverflow prevents to downvote existing answers, if Wikipedia stops banning of newbies and if the participants of a video game are only allowed to send greetings to the team players but are not allowed to criticize each other. This is equal to an anti-crisis communication. That means, no problems are there, and no conflicts have to be solved. Such a situation is equal to not playing the game at all.

All the issues in Stackoverflow, all the edits in WIkipedia and all the existing multi-player games in the internet has to do with solving problems. That means, at first there is some sort of issue, and different users have to interact to solve this issue. They are doing so with a crisis communication which is equal to a panic mode. Players who have learned to use such a language are able to become successful in such a game. While players who are not able to cope the stress get excluded from the game or resign by itself.

The best example is perhaps the Wikipedia game. It's a website in which the users are creating an encyclopedia. Everybody who is familiar with wikipedia will describe the situation in the talk sections are stressful. Wikipedia internal conflicts are solved and created with a panic based natural language. In the easiest case, an admin comes to the conclusion that an edit of the newbie doesn't make sense. But longer term Wikipedia editors are criticizing each other in the same tone. Does this crisis communication shows, that the Wikipedia has failed and the projects become obsolete within 2 months? No, it's the opposite. Because at the same time, the Wikipedia articles which are presented as a frontend to the reader have a higher quality since ever.

That means, a stable communication system contains of crisis communication and relaxed appearance at the same time. Let us observe a conflicting computer game from the outside. The different players in the game are communicating against each other. They are tracking conflicts and are not motivated to slow down their voice. At the same time, the game is running great. That means, the experts are playing the game and the success is guaranteed. This sort of mixed impression is available for all complex group oriented games. For example, at Stackoverflow each day thousands of downvotes and negative comments are posted. At the same time, the answer quality of the website is high. That means, if somebody has a problem with programming in a certain language he will find the answer at this single website very sure.

The reason why conflicts and complex problem solving are belong together has to do with asymmetric information. The typical situation in Wikipedia is, that user1 is an expert for a domain, while user2 not. From a technical point of view, both users are not able to work together, because their knowledge doesn't fit together. The same is true for most multi-player online games in the internet. Player1 is a newbie, player2 is an expert and they have never played before in this game. The result is, that they don't understand each other. The interesting situation is that the game will start with this bad situation. The result is, that during the game the users are communicating wrong, and they will make mistake. After recognizing the mistakes, they will lame each other not playing well enough.

This situation isn't located in a certain player, but it's the general pattern for all online games, and for all online forums. The starting situation is, that from an objective point of view, the newly created group isn't prepared and shouldn't work together. But this is never a barrier. No matter which player are attending a game server, the game will start in every case.

The reason why so much conflicts are available is because the players are different. The conflicts will become greater if the background of the player doesn't fit to each other. The users are arguing from very different point of views about the same subject. And the conflict is a clearing mechanism for negotiating with each other, especially in a complex domain.

To fasten things up, it make sense to assume that in every multi-player video games are conflict will become visible and the only question is how the group will solve these conflicts. Solving the conflict means, that the individual needs are matched to the need of the group. For example, a successful interaction with Wikipedia means that an individual is allowed to post something and at the same time the Wikipedia project will profit from it. It's some sort of win-win situation.

If a group or an individual struggles in solving issues, it will become a loose-loose-situation. That means, the user edit get rejected and at the same time, Wikipedia loose an important character who won't contribute anymore.

April 01, 2020

Recent developments in the SE.AI website

The SE.AI website https://ai.stackexchange.com/ is the dominant AI related online forum in the internet. It contains of 6k questions and is part of the larger Stack exchange network. Since a while major changes are taking place in the website. The situation in the past was, that the moderator played a minor role. He didn't posted many comments nor answers, but his objective was administrative nature. He was some kind of technical administrator but wasn't involved in running the website.

Since the year 2019 the situation has changed drastically. The new agenda in SE.AI is, that the moderator is the top rated user in the forum. That means, the current moderator has posted the most answers and has earned the most reputation points. That means, the moderator knows most about Artificial Intelligence and at the same time he is solving conflicts in the community.

Before we can judge about this development it make sense to describe this management style from an abstract point of view. It's equal to elect the best player in a soccer team as the team leader. The result is, that the team leader is the weak point in the overall system. Let us construct an example. Suppose a difficult question is asked in the forum. Only the moderator is able to provide the answer because he has the most experience of all the users. The other users in the community have a weaker position and they not enough skills about Artificial Intelligence. That means, the knowledge and power is distributed unequal.

This management style has some advantages but also disadvantages. The advantage is, that it will minimizes the conflicts in the team. The moderator is accepted because of two reasons, first his social role is strong, and secondly his knowledge is strong. The disadvantage is, that a moderator driven community is vulnerable to a takeover. If the single point of failure makes a mistake the entire group becomes in trouble. The second problem is, that the stress level for the moderator is higher. To defend his strong position he has to create the most postings and has to know everything. The danger is, that the moderator isn't able to do on the long term.

In the management theory there are two different principles discussed: top down moderation and bottom up moderation. Top down moderation is available in SE.AI. It's the classical form of group organization. The group has a strong internal cohesion but fails to adapt to the environment. The team is fixed, new members aren't welcome the moderator is not allowed to leave the group.

It's a bit difficult to predict the future development of SE.AI. One option is, that the moderator is able to handle the disadvantages of top down management style and remains in a active position for the next 10 years. The second option is, that the group isn't able to adapt to future needs. For example, that a request from the outside isn't answered correct. Basically spoken, SE.AI is doing an experiment to investigate if top down leadership works in reality.

March 24, 2020

Freezing an academic paper

With the upraising of the Open Access movement it was discussed heavily in the literature what an academic journal is and what not. The most widely accepted definition was created around the term predatory journal. The term was invented to make clear what the difference is between a serious peer reviewed journal and a joke-non peer reviewed journal.

The term predatory journal is widely accepted not because the definition is correct, but it was used in many thousands papers by OpenAccess experts. The term predatory journal is equal to a low cost journal. What predatory publishing has in common is, that the Article processing charge is lower. The typical predatory journal is published online-only, has a reduced fee of 100 US$ per paper and no peer review takes place. This definition was over a long time span an ideal definition to sort the existing journals into two groups. But it fails to explain what a peer review is.

A more elaborated definition divides academic journals into two groups: peer reviewed and non-peer reviewed. The problem is, that it's much harder to define the peer review process. Even Wikipedia has no clear understanding what peer review is about. The working hypothesis is, that peer review is realized with a stable branch in which a frozen upstream gets evaluated.

Freezing the upstream is something which is more complicated than a normal preprint server. A preprint server is location in which authors are submitting their journals. For example Arxiv is a preprint server, but Academia.edu and a github folder too. What all preprint server have in common is that no peer review takes place. Somebody uploads a document and the reader doesn't know if the document has high or a low quality.

A naive assumption is, that a preprint is transformed into a journal by peer reviewing the preprint. That means, somebody sends the manuscript to an expert and the expert gives a quality judgment. This understanding describes only the surface. What is missing is the reason why somebody should peer review a paper. A typical assumption from the past was, that peer review is equal to paid peer review. All the existing academic journals are working with money in the loop. So the assumption is, that a serious journal is equal to a high price journal.

The surprising information is, that this definition describes also not the complete picture. It's possible to combine high quality peer review with a non-commercial journal. The important underlaying process has to do with freezing the upstream. Freezing is a term used by Open Source advocates which are using the git version control system. A freeze is equal to create a branch. On the command line it's done with a simple “git branch stable” This creates a new branch, called stable, and in this branch a snapshot of the master branch is created. A freeze is equal to a point copy of the existing files.

Let me give an example. Suppose an author has uploaded a HTML document to a preprint server. The HTML document contains a 8 page long paper which consists of 20 bibliographic references at the end. Now, somebody else creates a copy of the document. He is freezing the upstream document. The result is, that both documents, original.html and copy.html can be edited independent from each other. The files are located in different folders. The ability to edit a document independently is producing a version conflict. To overcome the version conflict some sort of communication is required.

The communication actions to overcome a version conflict are equal to the peer review process of an academic journal. There are different options in doing so:

- dedicated peer review by external experts

- decision making by the journal editor

- negotiating on a mailing list

- overwrite the version by technical actions because user1 has admin rights, while user2 not

A dedicated peer review is only one option to solve a version conflict in a two branch project. The peer review process isn't at the beginning but it's an answer to a version conflict. The underlying reason is the out of sync behavior of two branches which holds the same file. Branch1 is maintained by the upstream in the preprint server, while branch2 is maintained by the journal in the downstream. Let us investigate what a potential alternative to an upstream freeze is.

Suppose the idea is not to create a stable branch but contribute to the original.html file in a different fashion. The workflow would work in the following way. At first, the author uploads the original.html file to a preprint server. Now a second user likes the article but he would like to add something. He sends an e-mail to the author with an additional paragraph. The original author accepts the modification and a new file is uploaded to the preprint server which is origina-improved.html.

That means, the modification in the file are taken place in a single branch. The original author plus user2 are communicating back and forth and if they have found a shared position, the file gets updated. Conflicts are not possible, because if the original author doesn't accept a modification the user has no option to modify the file.

The major difference between a single branch development model and a two branch model is, that in the two branch model a conflict is the possible. This conflict produces a certain communication style. Perhaps it make sense to provide an example. In the normal single branch model. the original author owns the file and the second user is a subordinate. In a two branch model the second user owns the stable branch and the original author is a subordinate. This kind of flipped social relationship is available for all peer reviewed journals. The original authors sends a manuscript to a peer reviewed serious journal only for the reason to become the subordinate of the journal editor. Not the author but the journal decides if the submission has a high quality. The advantage of this flipped role model is, that the reader of the journal has an advantage from it. The reader trust a journal if not the authors but the journal editor take the decisions.

March 23, 2020

Creating academic journals as a Linux distribution



The best role model for an academic journal is the Debian Linux distribution. Debian is working with two sections: upstream and downstream. A minimal academic journal will contain of a wiki page which contains of two sections for upstream and downstream.

The main feature is that the upstream and downstream section are running out of sync. In the downstream, the same papers are available like in the upstream section, but they have a different version. In the terms of the git version control system, the upstream section is a fork. The result is, that both sections can be edited independent from each other. This produces a lot of chaos and there is a need for an intermediate maintainer. His obligation is to sync the downstream with the upstream. And for doing so, some decisions have to made.

The result is a working journal editing pipeline. The overall system accepts incoming manuscripts in the upstream section which are provided by authors and it generates stable releases which are consumed by the audience. The idea is not completely new. The upstream section is sometimes called a preprint server, while the downstream section is equal to an overlay journal. What was missing in the past, is a clear minimalist description to build such a pipeline.

The most easy to realize system holds all the sections in a single wiki file. That means, the upstream and downstream section are not branches in a github project, but they are sections in a textfile. Then the changes of the textfile have to be tracked. How well the system is working depends only on the amount of edits. If more authors and maintainers are able to participate the journal will become more efficient.

Perhaps it make sense to describe each part. The upstream section is equal to a classical submission system. Authors are invited to upload their manuscript to a server. They can edit the document which is producing a new version. Every author can upload more than a single paper. This kind of preprint server makes sense for authors because it's a storage for their manuscript but the normal reader has no need to read through the documents. The upstream section is equal to the Arch Linux project. There is a machine generated trunk version which contains of the latest version of each document. But this trunk version has no value for the reader.

In the section “downstream” the existing content gets aggregated. The first decision to take is which of the papers are fitting to the academic journal. in the diagram the papers #1 and #2 are selected for the first issue of the journal. The issue #1 of the journal is a copy of a certain version from the upstream. It can be edited separately from the upstream version. This produces a conflict. Instead of providing a single trunk branch which holds all the papers, two branches are available which are running out of sync. This two branch model has a large impact:

- first it generates a role model for author, reader and journal maintainer They are located on different positions in the workflow

- secondly, it produces unsolved questions. The maintainer has to decide which papers are the right one and in which version they are accepted in the journal. The reader has the obligation to give feedback to the maintainer, and the author has to think about why a certain paper was rejected.

- third: the newly generated role model in combination with the unsolved questions results into a communication pipeline. A mailing list, a forum and an issue tracker is needed to coordinate all the stakeholders and requests.

Peer review made easy

Existing academic journals are equipped with a peer review. This is the main advantage over a normal preprint server. A preprint server is only an online storage for a document comparable to an individual blog, but a peer reviewed journal provides a trust layer on top of a paper which makes it more likely that a paper gets referenced by other.

So what is the secret behind the peer review process? Has it to do with sending a manuscript to experts? Yes and no. Peer review is the result of two branch development model, very similar what Linux distributions are doing. The Arch Linux distribution can be compared with a preprint server, it doesn't has a peer review. Only Debian consists of a stable and an unstable branch and the result is some sort of moderation. Perhaps it make sense to describe the overall workflow for a software project.

In the easiest case a single programmer creates a new project at github and uploads the self written sourcecode. By default a github project consists of a single branch, the master branch. Master is equal to the development aka trunk branch. If the software author has created a new version of the software he is sending the commit to this branch.

A more elaborated workflow contains of at least two branches: one development and one stable branch. By creating a stable branch a point snapshot is created from the development branch. After creating the branch, both branches will become out of sync. That means, the same file helloworld.py can be edited in the development and in the stable branch independent from each other. The result is a conflict. The conflict will be there if both branches should be merged. Because during the merge process the maintainer has to answer which of the versions is the right one.

Basically spoken, a second stable branch is created for the single purpose to create a conflict during the merge process. Every conflict has to be resolved. This can be realized with a mailing list or with a peer review. If only a single branch (the development branch) is available no conflict is there and no peer review is needed. The conflict can be explained with social roles. In the example with the two branch github model there are two conflicting roles: one programmer is responsible for the development branch and the other for the stable branch. The role conflict is producing a higher quality of the project. That's the reason why the Debian Linux distribution is recommended for productiion server, while Arch LInux isn't recommended for such a purpose. And exactly for the same reason, a peer reviewed paper gets referenced by other while non-peer reviewed paper won't.

Let us go back to the inner working of an academic journal. Suppose a journal contains of a development branch and a stable branch. The result is, that in the stable branch some decisions have to be taken. The major decision is, if a paper in the development branch should be published in the next issue. Solving this problem can be done in many ways. Either a random generator is asked, a formalized rule book is asked or in the best case, an external peer reviewer is asked for a quality judgment. That means, the maintainer of the stable branch of an academic journal makes his life easier, if he sends out an unpublished manuscript to external experts and asks them to review the content.

If the stable branch maintainer isn't doing so, he can't do a decision if the paper should be published. The consequence is, that the next issue can't be go online. The same situation is available for the Debian distribution. Before the next major release is published, the maintainers have to answer the question, if a certain upstream version should be included in the distribution or not. This kind of decision is only needed for stable release Linux distribution. In the Arch Linux project there is no need for such a decision, because the upstream dictates which version is the correct one, which is always the latest, no matter if it's an improvement or not.

Academic journal from scratch

Creating a peer reviewed journal from scratch is pretty easy. All what is needed is a two branch development model which is running out of sync. In the unstable branch the authors are uploading their manuscripts and in the stable branch the next release of the journal is prepared. Everything else, for example in which file format the manuscript is accepted, or which persons are allowed to peer review a paper are minor decisions. The same principle of a two branch model works in very different situations. It can be realized for a printed journal, for a predatory journal, for a serious journal, for a non sense journal, for an amateur journal, for a journal which is based on MS-Word, or which is based on LaTeX.

The social mechanic of peer reviewing is the result of a conflict between upstream and downstram branches. That means, a journal which is working with a single branch doesn't provide a peer review, and in case of two branches, a peer review is possible.

March 22, 2020

Building an academic journal with stable releases

From a technical perspective all the tools are available to create an academic journal from scratch. Webspace is available in a blog which allows to upload pdf files easily, the pdf file can be created with most document processors like Libreoffice or LaTeX and the version history during writing the document can be tracked with the git tool. Suppose a single author combines these tool and creates some papers, are these papers the same as an academic journal? No they don't, something is missing because the readers won't trust the journal. The reader understands what traditional journals are doing for example Elsevier and Wiley, but he isn't interested in reading self-created pdf papers, especially not if the content is provided for free.

It's possible to formalize the missing part better. It's called an Open Access downstream. The term downstream was invented in the domain of Linux distribution. For example the Debian distribution is the downstream, while the sourcecode in the stable version is called the upstream. The workflow from the beginning which includes the pdf file format created in LaTeX is located in the upstream. It has to do what the single author has to do for creating the content. The missing part called the downstream makes sure, that the content is forwarded to the normal user. It's a layer between the upstream and the normal user.

Let us describe what Debian is doing. Technically Debian is an additional branch in the version control system. A branch is a copy of the original content. This idea can be simplified a bit for better understanding. Suppose on the harddrive are two folders available. In folder A the incoming files from the upstream are stored, which is the pdf document of the author which contains the paper. In the folder B the stable branch is stored which can be read by the normal reader. The question what the downstream has to do answer is, what exactly should be copied into the stable branch.



In the diagram the picture explains the idea visual. Without the downstream branch, the reader has direct access to the upstream version of the documents. It's some kind of Arch Linux for academic publication. The authors are uploading the pdf files to a server, and the reader can read the information. The interesting point is, that in reality such a direct connection between author and reader doesn't work. To make the information from the upstream easier to read, the users are expecting a layer in between. This is called a journal. The journal is the downstream. It is doing the same what the Debian project is about. The journal forks the content from the upstream into an own branch, and for doing so, some decisions have to be made. In the given example, the decision was made to accept the pdf file 1 and also the pdf fil

e 2. The second decision was which version of the manuscript was accepted. The interesting result is, that for the reader it's easier to consume the downstream information than the upstream one.

It's important to know that in the journal branch no content is created, but the existing content is aggregated. The role model is again the Debian ecosystem. A debian maintainer hasn't programmed a piece of code, but he is talking with the upstream developer on a mailing list. If somebody likes to create an online academic journal, he needs such a workflow. It's only option to create trust.

It's interesting to know, that an academic journal doesn't need to be have a printed one. In the example diagram all the information is organized online only. What is important instead is, that n the version control system the upstream branch is forked into the downstream branch. The concrete decision who to do so is done by the journal editor. The result is two fold. First, for upstream authors is easier to communicate with the downstream section, and secondly it's easier for the reader to communicate with the downstream section.

How to communicate between two parties?

The diagram looks a bit complicated. There are so many circles and arrows. Why are the authors not only copy the files to a server and the reader browse through the content? This is a nice question. So good news is, that it was researched in detail for creating Linux distribution. It's the old question if Arch Linux or Debian Linux is the better development model. What the picture shows is the complicated Workflow of debian. According to the Debian community, it's not enough that the normal user gets the latest software from the upstream, but he needs a hand-curated distribution which is different from a testing repository. The result is that software developers and end-users are separated from each other. The author of a software checks in the latest changes in the upstream repository, while the user of the software has only access to the downstream version. The layer in between, called downstream, is used for communicating back and forth. That means, if the reader of a pdf paper has found a mistake he isn't contacting the original author but he opens a thread in the mailing list of the downstream community.

In the debate around Open Access this principle is sometimes called an overlay journal. An overlay journal takes existing pdf papers hosted in a repository, creates a copy of it and redistribute it to the user. Technically an overlay journal can be realized as a branch in the version control system. Let us make a practical example.

Suppose the idea is to build an academic journal in github. At first, we need two authors who have uploaded a paper to their individual git repository. In this repository the authors are allowed to maintain their individual version history. That means, the initial project gets updated to correct spelling mistakes.

Then an additional git repository is created which is a copy of the pdf file 1 and pdf file 2. Doing so is called forking. Forking means, to take a snapshot of a github folder and copy the content into a new one. Then the fork is improved a bit, for example, a cover letter is created, and a forward is written by the journal. And voila, the new academic journal is ready and can publish his first volume.

And now comes the interesting part. Such a pipeline will produce a lot of stress. The first thing what will happen is, that both upstream authors have recognized that their content was forked. They will open a new ticket in the journal directory and ask for the reason. Secondly, the first readers are not happy with the content and they will open a ticket as well. That means, in the github repository of the journal lots of traffic is created in which both sides are creating unsolved tickets. And this is equal that the journal is accepted by a third party. If somebody creates an issue against a github project he has a need to communicate with this project.

Perhaps it make sense to simply the creation of an academic journal to a minimum. From a bottom up perspective an academic journal is created with the unix command:

cp -r upstream/ downstream/

This unix command copies the existing upstream/ folder into a new one. It's not a soft link or a redirect but a copy. This copy creates a new branch from scratch and can be updated seperately. That means, if somebody edits in the file1.txt both folders will get out of sync. This produces a stress which is compensated by communication of the mailing list. Basically spoken, an academic journal is a forked of existing pdf files.

Can Wikipedia be forked?

The entire Wikipedia is too large to create a fork. The project has over 20k users and building a second encyclopedia from scratch would take too much manpower. But, if the aim is to fork only a single category, for example articles about Artificial Intelligence, a fork isn't very hard.

Suppose a single user creates 30 edits per month with a size of 1000 byte each. And the fork contains of 10 users who are working in parallel. After 5 year the project has generated 9000 article with 2000 byte for each of them. And after 10 years the small team of 10 users has produced the same amount of content which is available in real Wikipedia.

A good starting point of a Wikipedia fork is to submit new articles no longer to the Wikipedia itself, but only to the fork. The list with requested articles about AI is located at https://en.wikipedia.org/wiki/Wikipedia:Requested_articles/Applied_arts_and_sciences/Computer_science,_computing,_and_Internet#Artificial_Intelligence The content isn't written yet. But it can be created from scratch and then the article gets uploaded to the fork wiki. The bottleneck for the project is to motivate some users to participate. In most cases the users are only interested to upload content to Wikipedia but not a fork, because the clone has a smaller amount of pageview and no working copy editing team which is correcting spelling mistakes and moderates the process.

On the other hand, the content of the original wikipedia is overestimated. The articles in the AI sections contains of around 50 flagship articles with 50k bytes, and the rest has a poor quality. It's possible to build something which works better from scratch. That means, without take the existing content as starting point but create everything from scratch which will result into the lowest possible copyright conflict.

The only thing what is harder to fork is Google Scholar. Google Scholar and the underlying full text repository contains of 50 million academic papers. The AI Section in Google Scholar has around 1 million papers written by scholarly authors. Writing this content from scratch is very complicated and would take large amount of time and manpower. In contrast, the WIkipedia project is some kind of slideshow community. The users are creating overview snippets for existing academic full text paper in the hope that this is attractive for a larger audience.

The reason why academic publishers are not motivated to engage in Wikipedia is simple: because the project is trivial. Trivial means, that the amount of ressources which are required to build an encyclopedia is low. The entire WIkipedia which contains of all articles can be run with around 10k people. If the aim is to build only a subpart of the project about a single academic topic, for example artificial Intelligence, the amount of needed ressources are around 10 persons who are creating the content from scratch. That means, academic authors are able to build their own encyclopedia from scratch without copy&paste a single sentence. They are writing all the articles from scratch with less than 100 users in a short amount of time.