May 29, 2018

Bottleneck in AI research are Open Access repositories


With the success of neural networks, some researchers are believing, that DeepLearning is the best practice method for implementing strong AI and that they need only faster GPU hardware to implement more powerful neural networks. They are identifying the problem in technology, mostly the combination of current hardware and software which limits today's Artificial Intelligence.
But, the real bottleneck is somewhere else. Neural networks are only one choice for realizing autonomous driving. The more general idea is use heuristics and algorithms. Programming this type of computer code is easy, because with manual effort it is always to program nearly everything. The bottleneck is how fast a human can program code. Before a new software-system can be realized from scratch, preliminary research is needed. Surprisingly the workflow of programming Robotics and Artificial Intelligence is similar to the workmode used in non-scientific diciplines, namely sociology, philosophy, history and language studies. These work areas are depended from sources. That means, a paper consists of references to other papers. And this is the bottleneck in AI research too.
To make the point more clear. Today, there are around 50M academic papers from all disciplines out there. Better AI systems need as a precondition more papers and better papers. We can speculate how much papers has to be written, until autonomous driving, biped robots or working image recognition is possible. That means, without Open Access repositories it is not possible to realize Artificial Intelligence.
The reason why is simple: everything in computing has to be realized manually. Every piece of Sourcecode has to be formatted by hand, and every idea is copied from somewhere else. If the environment contains lots of information in million of papers, it is easy to get new idea, if only a small amount of information is available, it is harder to realize something new.
The pathway to intelligent machine has surprisingly little to do with traditional Artificial Intelligence research or neural network. Instead the precondition can be summarized as follows:
- a robot competition like Robocup rescue
- a working preprint server
- Google Scholar
All these preconditions are important, because only humans can programs machines. That means, instead of making robots intelligent at first it is important to make the people intelligent. Perhaps one example: if 100 papers are available which are explaining how a pathplanning algorithm works, and additonal some github repositories with working code, it is very easy to realize yet-another-pathplanner from scratch. In contrast, if no paper is available and no code as inspiration, it is a very hard task.
Let's have a look how the situation in the area of Open Science and search engines is. In one word, it is a mess. That means, classical publishers are blocking most of the submission, the papers which are already written are not allowed to read, and it is not possible to upload new information to Google Scholar. That means, Open Science is not working yet, it is only a vision. As a consequence it is not possible to implement under such conditions any useful robots or Artificial Intelligence software. Before we can talk about robot-control-systems, deeplearning, AI planning and symbolic reasoning it is important to fix the Open Science issue first. That means, if the information flow is restricted and the number of people is low who can participate in the workflow, it makes no sense to start thinking about Artificial Intelligence in detail.
I do not think, that problems in Artificial Initelligence have to do with the discipline itself, for example a lack of ideas or a minunderstanding what thinking is, the main bottleneck is the science system itself, that means a lack of preprint server, a non working Open Science society, and missing search engines for scientific literature.
Improvement
There is no single reason why Open Science has failed. Instead there is a huge number of detail problems which prevents a success:
- no broadband internet available in the universities
- outdated operating system like Windows XP plus MS-Word
- a library which is organized around printed books and interlibrary loans which takes weeks
- missing English skills by professors and students, especially for writing manuscripts
- outdated publishing workflows which takes 3 years and costs a lot of money
- goal of absurd high-quality standard which results into continuous proofreading and delaying of publications
- government sponsered research without the need of evaluation or cost-reduction
- a general neoluddism attitude which prevents online-recording of lectures and storing information electronically
- very low amount of yearly published papers, some universities have published in one year not more then 12 papers, at the same time they have 1000 employees who are doing what?
If one of the above cited characteristics is missing, that will be no problem. For example, even with Windows XP and MS-Word it is possible to write productively a science-paper. But if many of these points are occurring at the same time, the result is that Open Science is no longer possible. If all of the features are true at the same time it is equal to a disaster for research.
Again, before it is possible to talk about Robotics, the precondition of Open Science must be fulfilled. That means, it makes no sense to explain how machine learning works, if it is unclear how a student can submit his paper to a server.
Let me explain, what not the bottleneck is. The problem is not, if a student didn't understand a topic or published a paper which is nonsense. Publishing a Scigen like papers shows only, that the production workflow works. It is some kind of test-paper to say “hello world” to the academic community. Such a test-paper evaluates if the PDF export of LaTeX works, if the upload to a repository works, if the internet connection is stable, if the peer-review system detects the paper as spam and if any blogger is out there who noticed the case. Bringing a Scigen like paper online is not a failure, but it proves that Open Science is working. The problem is the opposite, that means if no spam-like papers gets published, because the researcher has no internet connection, or he never wrote a paper.