March 23, 2020

Creating academic journals as a Linux distribution



The best role model for an academic journal is the Debian Linux distribution. Debian is working with two sections: upstream and downstream. A minimal academic journal will contain of a wiki page which contains of two sections for upstream and downstream.

The main feature is that the upstream and downstream section are running out of sync. In the downstream, the same papers are available like in the upstream section, but they have a different version. In the terms of the git version control system, the upstream section is a fork. The result is, that both sections can be edited independent from each other. This produces a lot of chaos and there is a need for an intermediate maintainer. His obligation is to sync the downstream with the upstream. And for doing so, some decisions have to made.

The result is a working journal editing pipeline. The overall system accepts incoming manuscripts in the upstream section which are provided by authors and it generates stable releases which are consumed by the audience. The idea is not completely new. The upstream section is sometimes called a preprint server, while the downstream section is equal to an overlay journal. What was missing in the past, is a clear minimalist description to build such a pipeline.

The most easy to realize system holds all the sections in a single wiki file. That means, the upstream and downstream section are not branches in a github project, but they are sections in a textfile. Then the changes of the textfile have to be tracked. How well the system is working depends only on the amount of edits. If more authors and maintainers are able to participate the journal will become more efficient.

Perhaps it make sense to describe each part. The upstream section is equal to a classical submission system. Authors are invited to upload their manuscript to a server. They can edit the document which is producing a new version. Every author can upload more than a single paper. This kind of preprint server makes sense for authors because it's a storage for their manuscript but the normal reader has no need to read through the documents. The upstream section is equal to the Arch Linux project. There is a machine generated trunk version which contains of the latest version of each document. But this trunk version has no value for the reader.

In the section “downstream” the existing content gets aggregated. The first decision to take is which of the papers are fitting to the academic journal. in the diagram the papers #1 and #2 are selected for the first issue of the journal. The issue #1 of the journal is a copy of a certain version from the upstream. It can be edited separately from the upstream version. This produces a conflict. Instead of providing a single trunk branch which holds all the papers, two branches are available which are running out of sync. This two branch model has a large impact:

- first it generates a role model for author, reader and journal maintainer They are located on different positions in the workflow

- secondly, it produces unsolved questions. The maintainer has to decide which papers are the right one and in which version they are accepted in the journal. The reader has the obligation to give feedback to the maintainer, and the author has to think about why a certain paper was rejected.

- third: the newly generated role model in combination with the unsolved questions results into a communication pipeline. A mailing list, a forum and an issue tracker is needed to coordinate all the stakeholders and requests.

Peer review made easy

Existing academic journals are equipped with a peer review. This is the main advantage over a normal preprint server. A preprint server is only an online storage for a document comparable to an individual blog, but a peer reviewed journal provides a trust layer on top of a paper which makes it more likely that a paper gets referenced by other.

So what is the secret behind the peer review process? Has it to do with sending a manuscript to experts? Yes and no. Peer review is the result of two branch development model, very similar what Linux distributions are doing. The Arch Linux distribution can be compared with a preprint server, it doesn't has a peer review. Only Debian consists of a stable and an unstable branch and the result is some sort of moderation. Perhaps it make sense to describe the overall workflow for a software project.

In the easiest case a single programmer creates a new project at github and uploads the self written sourcecode. By default a github project consists of a single branch, the master branch. Master is equal to the development aka trunk branch. If the software author has created a new version of the software he is sending the commit to this branch.

A more elaborated workflow contains of at least two branches: one development and one stable branch. By creating a stable branch a point snapshot is created from the development branch. After creating the branch, both branches will become out of sync. That means, the same file helloworld.py can be edited in the development and in the stable branch independent from each other. The result is a conflict. The conflict will be there if both branches should be merged. Because during the merge process the maintainer has to answer which of the versions is the right one.

Basically spoken, a second stable branch is created for the single purpose to create a conflict during the merge process. Every conflict has to be resolved. This can be realized with a mailing list or with a peer review. If only a single branch (the development branch) is available no conflict is there and no peer review is needed. The conflict can be explained with social roles. In the example with the two branch github model there are two conflicting roles: one programmer is responsible for the development branch and the other for the stable branch. The role conflict is producing a higher quality of the project. That's the reason why the Debian Linux distribution is recommended for productiion server, while Arch LInux isn't recommended for such a purpose. And exactly for the same reason, a peer reviewed paper gets referenced by other while non-peer reviewed paper won't.

Let us go back to the inner working of an academic journal. Suppose a journal contains of a development branch and a stable branch. The result is, that in the stable branch some decisions have to be taken. The major decision is, if a paper in the development branch should be published in the next issue. Solving this problem can be done in many ways. Either a random generator is asked, a formalized rule book is asked or in the best case, an external peer reviewer is asked for a quality judgment. That means, the maintainer of the stable branch of an academic journal makes his life easier, if he sends out an unpublished manuscript to external experts and asks them to review the content.

If the stable branch maintainer isn't doing so, he can't do a decision if the paper should be published. The consequence is, that the next issue can't be go online. The same situation is available for the Debian distribution. Before the next major release is published, the maintainers have to answer the question, if a certain upstream version should be included in the distribution or not. This kind of decision is only needed for stable release Linux distribution. In the Arch Linux project there is no need for such a decision, because the upstream dictates which version is the correct one, which is always the latest, no matter if it's an improvement or not.

Academic journal from scratch

Creating a peer reviewed journal from scratch is pretty easy. All what is needed is a two branch development model which is running out of sync. In the unstable branch the authors are uploading their manuscripts and in the stable branch the next release of the journal is prepared. Everything else, for example in which file format the manuscript is accepted, or which persons are allowed to peer review a paper are minor decisions. The same principle of a two branch model works in very different situations. It can be realized for a printed journal, for a predatory journal, for a serious journal, for a non sense journal, for an amateur journal, for a journal which is based on MS-Word, or which is based on LaTeX.

The social mechanic of peer reviewing is the result of a conflict between upstream and downstram branches. That means, a journal which is working with a single branch doesn't provide a peer review, and in case of two branches, a peer review is possible.