March 22, 2020

Building an academic journal with stable releases

From a technical perspective all the tools are available to create an academic journal from scratch. Webspace is available in a blog which allows to upload pdf files easily, the pdf file can be created with most document processors like Libreoffice or LaTeX and the version history during writing the document can be tracked with the git tool. Suppose a single author combines these tool and creates some papers, are these papers the same as an academic journal? No they don't, something is missing because the readers won't trust the journal. The reader understands what traditional journals are doing for example Elsevier and Wiley, but he isn't interested in reading self-created pdf papers, especially not if the content is provided for free.

It's possible to formalize the missing part better. It's called an Open Access downstream. The term downstream was invented in the domain of Linux distribution. For example the Debian distribution is the downstream, while the sourcecode in the stable version is called the upstream. The workflow from the beginning which includes the pdf file format created in LaTeX is located in the upstream. It has to do what the single author has to do for creating the content. The missing part called the downstream makes sure, that the content is forwarded to the normal user. It's a layer between the upstream and the normal user.

Let us describe what Debian is doing. Technically Debian is an additional branch in the version control system. A branch is a copy of the original content. This idea can be simplified a bit for better understanding. Suppose on the harddrive are two folders available. In folder A the incoming files from the upstream are stored, which is the pdf document of the author which contains the paper. In the folder B the stable branch is stored which can be read by the normal reader. The question what the downstream has to do answer is, what exactly should be copied into the stable branch.



In the diagram the picture explains the idea visual. Without the downstream branch, the reader has direct access to the upstream version of the documents. It's some kind of Arch Linux for academic publication. The authors are uploading the pdf files to a server, and the reader can read the information. The interesting point is, that in reality such a direct connection between author and reader doesn't work. To make the information from the upstream easier to read, the users are expecting a layer in between. This is called a journal. The journal is the downstream. It is doing the same what the Debian project is about. The journal forks the content from the upstream into an own branch, and for doing so, some decisions have to be made. In the given example, the decision was made to accept the pdf file 1 and also the pdf fil

e 2. The second decision was which version of the manuscript was accepted. The interesting result is, that for the reader it's easier to consume the downstream information than the upstream one.

It's important to know that in the journal branch no content is created, but the existing content is aggregated. The role model is again the Debian ecosystem. A debian maintainer hasn't programmed a piece of code, but he is talking with the upstream developer on a mailing list. If somebody likes to create an online academic journal, he needs such a workflow. It's only option to create trust.

It's interesting to know, that an academic journal doesn't need to be have a printed one. In the example diagram all the information is organized online only. What is important instead is, that n the version control system the upstream branch is forked into the downstream branch. The concrete decision who to do so is done by the journal editor. The result is two fold. First, for upstream authors is easier to communicate with the downstream section, and secondly it's easier for the reader to communicate with the downstream section.

How to communicate between two parties?

The diagram looks a bit complicated. There are so many circles and arrows. Why are the authors not only copy the files to a server and the reader browse through the content? This is a nice question. So good news is, that it was researched in detail for creating Linux distribution. It's the old question if Arch Linux or Debian Linux is the better development model. What the picture shows is the complicated Workflow of debian. According to the Debian community, it's not enough that the normal user gets the latest software from the upstream, but he needs a hand-curated distribution which is different from a testing repository. The result is that software developers and end-users are separated from each other. The author of a software checks in the latest changes in the upstream repository, while the user of the software has only access to the downstream version. The layer in between, called downstream, is used for communicating back and forth. That means, if the reader of a pdf paper has found a mistake he isn't contacting the original author but he opens a thread in the mailing list of the downstream community.

In the debate around Open Access this principle is sometimes called an overlay journal. An overlay journal takes existing pdf papers hosted in a repository, creates a copy of it and redistribute it to the user. Technically an overlay journal can be realized as a branch in the version control system. Let us make a practical example.

Suppose the idea is to build an academic journal in github. At first, we need two authors who have uploaded a paper to their individual git repository. In this repository the authors are allowed to maintain their individual version history. That means, the initial project gets updated to correct spelling mistakes.

Then an additional git repository is created which is a copy of the pdf file 1 and pdf file 2. Doing so is called forking. Forking means, to take a snapshot of a github folder and copy the content into a new one. Then the fork is improved a bit, for example, a cover letter is created, and a forward is written by the journal. And voila, the new academic journal is ready and can publish his first volume.

And now comes the interesting part. Such a pipeline will produce a lot of stress. The first thing what will happen is, that both upstream authors have recognized that their content was forked. They will open a new ticket in the journal directory and ask for the reason. Secondly, the first readers are not happy with the content and they will open a ticket as well. That means, in the github repository of the journal lots of traffic is created in which both sides are creating unsolved tickets. And this is equal that the journal is accepted by a third party. If somebody creates an issue against a github project he has a need to communicate with this project.

Perhaps it make sense to simply the creation of an academic journal to a minimum. From a bottom up perspective an academic journal is created with the unix command:

cp -r upstream/ downstream/

This unix command copies the existing upstream/ folder into a new one. It's not a soft link or a redirect but a copy. This copy creates a new branch from scratch and can be updated seperately. That means, if somebody edits in the file1.txt both folders will get out of sync. This produces a stress which is compensated by communication of the mailing list. Basically spoken, an academic journal is a forked of existing pdf files.