Robotics and Artificial Intelligence: Some arguments for Arch Linux

Arch Linux plays a unique role under all Linux distribution. Because the Linux distribution can be explained very easily. The latest version of each software is compiled and installed on the PC of the user. The Arch Linux wiki and the pacman package manager are supporting this workflow very well. Most users understanding the idea behind Arch LInux so they using it at least for playing around.

From a more abstract perspective, Arch Linux is a developer friendly distribution. It supports the idea of agile software development. If a certain subsystem has a problem, a bug is created, the sourcecode is improved and with a delay of less than 24 hours the updated binary version can be downloaded from the server. No matter if the sourcecode of Firefox, Linux kernel, a texteditor or from a game was improved, the sourcecode gets compiled into binary versions and the user can download it from the server.

Unfurtunately, the Arch Linux project has some limits. It is used not very often in productive envirionments. In theory it's possible for doing so. That means, on a vserver an Arch Linux system can be installed and on the Laptop as well, but only few people are doing so. The exact reason is not defined clearly. Sometimes the explanation which is provided is, that sometimes the Arch Linux system won't boot after an update. But with the recent improvements of pacman this is seldom the case. In most cases, the boot process is working fine, and if not the manual intervention is minimal. Another explanation why Arch Linux isn't used in reality is because the concept is too new. That means, the concept of agile development and a rolling release doesn't fit to the well known waterfall software cycle so its hard to convince a larger audience in using the software in reality.

The more realistic description why Arch Linux isn't used for productive environment has to do with conflicting needs of developers and normal users. Arch Linux was developed from coders for coders. The project is located in the upstream and explains that the upstream is equal to the downstream. Everybody is a programmer and in exchange he gets the most secure software ever programmed. This story doesn't fit to the reality. First thing is, that most users are not interested in creating software, but they want to use it as a normal user. Arch Linux ignores the idea of software quality checks.

Let us describe the preconditions behind the Arch Linux workflow. The idea is, that the upstream never makes a mistake. If the Linux kernel was improved from version 1.0 to 1.1 this improvement make sense and there is no time to argue for the reason why. The problem is, that most software was written by amateurs and they are not programming the software for the normal users, but they are programming the code for other reasons. Especially in the Open Source ecosystem most software projects are started because the developer team likes to try out something. For example, somebody likes to learn how the C language is working and therefore he starts a gaming project in which the C language was used.

The average user assumes that the upstream has programmed malware, which is spying the data from the user. In contrast, the upstream assumes, that the normal user has no experience with computers at all and therefore he needs pre-defined settings. The consequence is, that no trust at all is available between upstream and downstream. This problem is ignored by Arch Linux. Arch Linux assumes, that no conflict between upstream and downstream is there.

Update over the Internet

Rolling release distributions like Arch Linux have become successful since the advent of fast internet connection. If the users are euipped with a stable internet connection, its possible to update the osftware every week. This narrative reduces the comparison between rolling release and stable release to a file transfer problem. The more elaborate comparison is focused on the development process. The bottleneck is located on the upstream level. Before a software can be installed somebody has to write the code. Software development is done with the git version control system in which a team of programmers are writing lines of code. The software development process has to be organized in a certain way. The management of writing code can be realized with rolling release and stable release.

Rollling release is equal to a single branch model which is trunk. It is the same principle used in a wikipedia article. There is only one current version of the wikipedia article and everybody is allowed to modify it. It is surprising to know, that in reality most software project doesn't work with a single branch model. Because software development is more complicated than creating a wikipedia article.

The first reason is, that the amount of commits is higher. The average wikipedia article contains of only 20 commits over a timespan of 1 year. While the average software project contains of thousands of commits. The second problem in software development that different tasks has to be solved in parallel. It's is possible to create new features, improve the security, update the documentation, and fix existing bugs. The best practice method in doing so is to use two branches or more.

The problem with two and more brnach models is, that no longer a current version is available. A current version means, that all the branches are merged into a single one, which is not the case. Instead, the average software project has many current versions at the same time:

• a current testing version

• a currrent security version

• a current stable version

• a current bugfix version

• and so on

The additional problem is, that these versions are improved independent from each other. This is the major advantage but also the major disadvantage of the git version control system. A rolling release software makes only sense if the development model is based on a single trunk branch.

Let us describe a common three branch software development model. If a one man project or a small team is starting a new project at github they will create three branches: stable-branch, issue-branch, testing-branch. If the developer likes to fix an issue from the bugtracker he will submit into the issue-branch, if the maintainer of the project likes to aggregate different bugfixes into the testing version he will merge the issue-branch into the testing branch and if a new stable version should be created, the testing branch is copied into the stable branch.

This three branch model is some sort of best practice method in software development. The surprising information is, that it's not a rolling release version. Instead the new versions in the stable branch are produced with a time lag. That means in january 2019 the bugfix was created, in March 2019 the testing branch was updated, and in June 2019 the new stable branch version was created. In this example, it took 6 months until the bugfix was available in the stable version. This timelag can't be reduced. The reason is, that the amount of ressources in a project are limited. For example, if the github project was created by 2 programmers, the maximum amount of written codelines per day is only 10x2=20 lines per code.

Let us make a small example. Suppose, the team likes to improve the software with 3000 additional lines of code. According to the math, they will need 3000/20=150 days for the task. If they are starting today, they are finished in 6 months. This delay produces the time lag in the release workflow. The only way to reduce the time between the occurence of a bug and until it was fixed in the stable version is to increase the amount of programmers. If the team has access to 200 programmers, they can reduce the timelag drastically.

Freezing the upstream

In the first example, a rolling release software project is described. It contains of a trunk branch which is updated once a day. The normal user is asked to install always the latest version, because it contains all the improvements and security fixes.

In the second example, a stable release is described. It is created by freezing the trunk branch. That means, on a certain time in the past, a copy of the sourcecode is created in a different folder. And then the copy gets improved to fulfill the needs of the normal user. Freezing the upstream is done as an addition to a normal upstream development. At the same time, the upstream trunk branch gets improved without interruption. That means, the stable team is able to create the freeze independent from the upstream developers.

It depends on the concrete software project how complicate it is to freeze the trunk branch. In most cases, the point release is created together with a handbook, security updates and bug reports against the stable version. The only thing what is sure is, that an additional stable branch needs more effort than only improve the trunk branch. A trunk branch has to do with the software project itself. Which is focused on the sourcecode and the improvements. While a stable branch has to do with the needs of the normal users.

Robotics and Artificial Intelligence

April 25, 2020

Some arguments for Arch Linux

No comments:

Post a Comment