July 01, 2018

Is the LInux kernel a fake?


A careful observation of the development in the LInux community will show, that every two months a new version of the Linux is released. In all cases it has lots of improvements, security fixes and a bit tuning. The updates comes regularly, frequent and have a high-quality. Can this story be true? What is Linux hiding from us?
On the first look, the development process is suspicious. It is not driven by humans but it seems that Linux is developed by an Artificial Intelligence. If real programmers would release the code, the frequnce would be much slower, and the quality weaker. How does look the computer program look like which is able to generate the Linux sourcecode? I'm not talking about the c-compiler, I'm talking about the sourcecode itself.
But let us take a step back. Can it be possible that high quality software without any serious bugs was programmed by humans? Yes and no at the same time. Indeed, software engineering is hard, on the other hand the workflow which results into the Linux kernel can be reproduced under clean condition. Suppose, we want to make our own “nearly perfect” software project. The first thing what we can do is to create a new git repository with the magic commandline expression “git init”. Then we are committing some edit. The result is a twofold. At first, the development process becomes transparent and secondly the result is nearly perfect sourcecode. That means, even we are not good in coding software, the git workflow allows us to take back wrong actions and evaluate edits according to their quality. That means, git acts as some kind of quality evaluation system which is equal to what is done in modern automotive fabrication.
The car production there is the same problem. At the first hand, the ordinary worker is not very accurate. He has a limited understanding of the internal of a motor, and his motivation for producing high quality is low. If 10 of such workers are producing the new car, it will fail. The workers will argue, they will leave out important steps, they will forget something to install and so on. And here comes the magic, called management. Even if the workers are not very good it is possible to produce many cars which are 99.9% perfect. That means, every car is the same, and no serous issues are there. This is realized in reality with a mixture of process monitoring, quality control and communication between the workers. Such a management system transforms an array of inexpensive workers ;-) into a high-quality production line. They will act nearly perfect and the car looks like as it was produced by robots.
The surprising truth is, that either in car manufacturing nor in programming the Linux kernel any kind of automation is in the loop. Most parts of the process (90% and more) were done by hand. Sure, the workers have some mechanical tools, and Greg Kroah-Hartman has some profiling tools for analyzing the code, but in general it is a manual task which is done by humans.
What i want to tell is, that the development of the LInux kernel, or better the quality of the Linux kernel, is the result of management decisions. Such system transforms weak and middle capable programmers into robot-like programming machines which are generating 99.9% top-quality. The interesting news is, that the process can be reproduced in any software project too. All what the programmers need is a git like version control system and a shared mission, for example to program a software like Google Chrome, gimp or whatever. And even if the single programmer has absolutely no understanding of computing the project will be a success.
Mastering robot-like quality
In a modern software engineering pipeline there are some techniques available which improves the sourcecode until a new perfect quality. Perfect means, that from an outsider perspective no humans but automatic artificial intelligence have written the code. In short the techniques are:
- git version control system
- mailing list
- search in stackoverflow for similar problems
- C as high-level language
- repository available in fulltext for every team member
- a group behavior which supports commits by quality not by personal status
All of these techniques are used in modern software projects, for example the Linux kernel but also in some other projects. It results at the end into a near-perfect sourcecode which is indistinguishable from automatic generated sourcecode. The surprising information is, that so called automatic generated code is not used in modern software development technique. That means, Linus Torvalds has no UML model which is transformed into executable sourcecode. Instead the quality is based on features like using of git, and seraching in Stackoverflow. That means, the human programmer is always in the loop. At first, he is reading the mailing list, then he is browsing the sourcecode, then he is searching stackoverflow for a similar problem, then he is fixing the bug, then he is pushes out the commit. That means, that the human programmer is in the centre and all the other tools are grouped around him.
With modern internet communication this workflow is transformed into a game-like system. That means, the sourcecode can be seen as an textadventure and the players how are contributing to the project are trying to maximize their score. Again, the workflow is lightyears away from an automatic / autonomous workflow. I would guess, that today less code generators are used, then 30 years ago. That is interesting because the academic literature about how to write high-quality software is based on models and code-generators. The idea is, that an abstract UML like model is created first, and in a topdown process this model is transformed into sourcecode. That is the way, software engineering is teached at the university and it has nothing to do with how software is programmed in reality.
What we see in real life is, that the programmers have an sourcecode editor open, for example eclipse, Visual studio or whatever, and they are editing every piece of code by hand. And that is not a malfunction in the system, it is the best practice method. Every single byte in the linux sourcecode was entered with a keyboard by a programmer who have pressed manually this button. And if we are observing the future trend on Stackoverflow then it is probably that sourcecode will become more important. That means, a good stackoverflow question about how to user a C pointer right is asked by posting some lines of code, and the perfect answer contains also some lines of code. That means, the workflow isn't based on code-generators but on teaching programming. This is sometimes called social coding, because the beginner has to learn it, the advanced user is giving advice and the communication process between them is stored in fulltext so that outsider can recognize a flame war on the mailing list without knowing the details ...