May 31, 2018

How long does it take to write a phd thesis?


In the domain of computerprogramming there is a benchmark available which measures the productivity of an average author. No matter how good somebody is, or which programming language he is using, he will reach only 10 lines of code per day. In the domain of thesis writing there is a similar benchmark available. Here is the question how many hours somebody needs to write a single page in his phd-thesis. The average value is 4 hours per page. That means, if the aim is to create a 150 page long thesis, the average author needs 600 hours.
The interesting aspect is, that we can discuss about this benchmark. Perhaps if it is to demanding and in reality the author will need 8 hours per page. But according to different discussions in the Internet, the number of 4 hours per page is fair to describe the current situation. It predicts quite good, what is possible and what not. Let us make a thought experiment. Somebody needs a 10 page long article for a journal submission. According to our above cited formula he will need 10 pages * 4 hours = 40 hours to write the paper. Is it possible to reduce the effort dramatically? No, like in the example with the programming productivity, it is a fixed value which is surprisingly constant over periods. What is possible is to try to increase the value, for example with a better wordprocessing software which allows faster formatting of a paper. But at the end, the bottleneck is the author himself. His ability to read the given literature, doing experiments and understand a domain is limited.
To make the situation a bit more realistic we can search for any dissertation on the internet. Perhaps the pdf document contains of 150 pages. We can download and read the dissertation in under 5 minutes, but for creating it the author had invested lots of energy. In most cases, a phd thesis was written over 2 years. It is the result of an ongoing research effort, which takes not only weeks but months until it was finished. And this is perhaps the most important reason for the Open Access movement. If a single phd thesis needs many months to write, it is a waste of energy to hide it from the public. The best time-saving advice is to not write any academic paper. That is only possible if paper written in the past are available in the internet, so there is no need to write the same information again.

How much researchers are enough to bring AI forward?


The ideology behind Open Science is to transform the masses into a scientific workforce and motivate millions of people to work together. They are distributing content like the dancers on the Love parade in Berlin which has over 1 million people on the same time who are collaborating. But, do we need such amount, or wouldn't it enough to mobilize only a small amount of people?
The famous Artificial Intelligence forum https://ai.stackexchange.com/ has only a limited amount of regularly users. According to the last statistics, in the last year only 50 people have posted some content to the forum. Surprisingly this small amount was enough. The forum contains lots of answers from different subjects, and incoming new questions gets an answer reasonable fast. The hypothesis is, that 50 serious users who are contributing to a forum or an academic journal are enough to bring a discipline forward.
Suppose we have 50 users, who are writing academic papers on the subject of Artificial Intelligence. Each user is able to write one paper a month. After one year, the group has produced 600 papers. It is not a second Arxiv.org repository but it is enough to bring the subject forward. So perhaps the ideal Open Science community consists not of million scientists, but only of 50? The question is not how to attract the whole planet to attend an online forum, the question is what the upper limit is. I would guess, that https://ai.stackexchange.com/ can handle perhaps 200 users who are posting questions and answers there. More wouldn't improve the situation but result into chaos. Today's 50 users are a bit small, but it is not advisable to increase the number to 500 or even more.
I would guess, that in other disciplines like economy, literature, biology and medicine is the situation equal. A single researcher is not able to run a forum and a journal, but 50 people who are working together are more then enough, and increasing the number of scientists to the range of 1000 or more would not help to improve the situation.

May 30, 2018

Bug: Fedora 28 upgrade didn't work


The new Fedora 28 operating system works not very good. The update itself was installed. But it took many hours to download the 3 GB traffic from the internet. The better idea is called delta-upgrades and is used in the Android environment, but it seems that Red Hat don't like the idea very much. Another problem is, that after the update was installed the Chrome browser uses too much cpu-time. Usually the program needs 100% of the processing power and forces the fan to power up. The next problem is the outdated version of Jabref from the Fedora repository. It makes troubles with updates, and the installed version is outdated by 3 years or so.
The next (minor) problem is, that the Lyx document processor is also not amused by the new Fedora 28, because it is no longer possible to change the size of the outline window left on the screen. I'm sure, there are many other problems available for example with the Java runtime environment, but I'm not very pleasant to mention all the details. To summarize the bugreport a bit: Fedora 28 is not the greatest iteration, there is room for improvement.

May 29, 2018

Bottleneck in AI research are Open Access repositories


With the success of neural networks, some researchers are believing, that DeepLearning is the best practice method for implementing strong AI and that they need only faster GPU hardware to implement more powerful neural networks. They are identifying the problem in technology, mostly the combination of current hardware and software which limits today's Artificial Intelligence.
But, the real bottleneck is somewhere else. Neural networks are only one choice for realizing autonomous driving. The more general idea is use heuristics and algorithms. Programming this type of computer code is easy, because with manual effort it is always to program nearly everything. The bottleneck is how fast a human can program code. Before a new software-system can be realized from scratch, preliminary research is needed. Surprisingly the workflow of programming Robotics and Artificial Intelligence is similar to the workmode used in non-scientific diciplines, namely sociology, philosophy, history and language studies. These work areas are depended from sources. That means, a paper consists of references to other papers. And this is the bottleneck in AI research too.
To make the point more clear. Today, there are around 50M academic papers from all disciplines out there. Better AI systems need as a precondition more papers and better papers. We can speculate how much papers has to be written, until autonomous driving, biped robots or working image recognition is possible. That means, without Open Access repositories it is not possible to realize Artificial Intelligence.
The reason why is simple: everything in computing has to be realized manually. Every piece of Sourcecode has to be formatted by hand, and every idea is copied from somewhere else. If the environment contains lots of information in million of papers, it is easy to get new idea, if only a small amount of information is available, it is harder to realize something new.
The pathway to intelligent machine has surprisingly little to do with traditional Artificial Intelligence research or neural network. Instead the precondition can be summarized as follows:
- a robot competition like Robocup rescue
- a working preprint server
- Google Scholar
All these preconditions are important, because only humans can programs machines. That means, instead of making robots intelligent at first it is important to make the people intelligent. Perhaps one example: if 100 papers are available which are explaining how a pathplanning algorithm works, and additonal some github repositories with working code, it is very easy to realize yet-another-pathplanner from scratch. In contrast, if no paper is available and no code as inspiration, it is a very hard task.
Let's have a look how the situation in the area of Open Science and search engines is. In one word, it is a mess. That means, classical publishers are blocking most of the submission, the papers which are already written are not allowed to read, and it is not possible to upload new information to Google Scholar. That means, Open Science is not working yet, it is only a vision. As a consequence it is not possible to implement under such conditions any useful robots or Artificial Intelligence software. Before we can talk about robot-control-systems, deeplearning, AI planning and symbolic reasoning it is important to fix the Open Science issue first. That means, if the information flow is restricted and the number of people is low who can participate in the workflow, it makes no sense to start thinking about Artificial Intelligence in detail.
I do not think, that problems in Artificial Initelligence have to do with the discipline itself, for example a lack of ideas or a minunderstanding what thinking is, the main bottleneck is the science system itself, that means a lack of preprint server, a non working Open Science society, and missing search engines for scientific literature.
Improvement
There is no single reason why Open Science has failed. Instead there is a huge number of detail problems which prevents a success:
- no broadband internet available in the universities
- outdated operating system like Windows XP plus MS-Word
- a library which is organized around printed books and interlibrary loans which takes weeks
- missing English skills by professors and students, especially for writing manuscripts
- outdated publishing workflows which takes 3 years and costs a lot of money
- goal of absurd high-quality standard which results into continuous proofreading and delaying of publications
- government sponsered research without the need of evaluation or cost-reduction
- a general neoluddism attitude which prevents online-recording of lectures and storing information electronically
- very low amount of yearly published papers, some universities have published in one year not more then 12 papers, at the same time they have 1000 employees who are doing what?
If one of the above cited characteristics is missing, that will be no problem. For example, even with Windows XP and MS-Word it is possible to write productively a science-paper. But if many of these points are occurring at the same time, the result is that Open Science is no longer possible. If all of the features are true at the same time it is equal to a disaster for research.
Again, before it is possible to talk about Robotics, the precondition of Open Science must be fulfilled. That means, it makes no sense to explain how machine learning works, if it is unclear how a student can submit his paper to a server.
Let me explain, what not the bottleneck is. The problem is not, if a student didn't understand a topic or published a paper which is nonsense. Publishing a Scigen like papers shows only, that the production workflow works. It is some kind of test-paper to say “hello world” to the academic community. Such a test-paper evaluates if the PDF export of LaTeX works, if the upload to a repository works, if the internet connection is stable, if the peer-review system detects the paper as spam and if any blogger is out there who noticed the case. Bringing a Scigen like paper online is not a failure, but it proves that Open Science is working. The problem is the opposite, that means if no spam-like papers gets published, because the researcher has no internet connection, or he never wrote a paper.

May 28, 2018

Education in Africa works with Chinese language


It may sounds surprising, but since 10 years or so there is an ongoing trend visible in education. The new favorite foreign language in the african continent is not English, but it is Chinese. On the first hand it makes absolutely no sense, because the continents are around 10000 kilometers away, but it seems that on both sides (teachers and students) there is a high demand.
Why is the Chinese language so adorable for the African students? I don't know. Chinese is very complicated, uses a non-latin-alphabet and it takes many years to master it. In contrast to basic english (which has no more then 1000 words) it is very hard to learn the language. But it seems, that most students are highly motivated. That means, teaching Chinese in Africa is some kind of win-win-situation.
The funny thing is, that in African-Chinese business relationships the preferred language is not French, English or an African dialect, it is Mandarin. I would guess, that Chinese is more often used, then even English. And in future we will see an increase trend. That means, the typical african student will learn as a mother tongue his local dialect. As a second language French, as a foreign language Chinese and as an international language English. If Chinese become so dominant that it will replace French is unclear. But it seems, that French was the second language from the past, while Chinese is the language of the future.
What is the fastest and cheapest way to learn Mandarin? I would guess, that electronic books and recorded video lectures in Chinese will become an important role. They have compared to a human-teacher the advantage, that the workflow is cheaper, and that a huge number of students can be teached at the same time.
The interesting aspect in language learning is, that a language is strongly connected to education. It is not possible to learn the language alone, it is teached with a cultural background.

https://www.youtube.com/watch?v=dAe5kX4xSxw

In the video we can see a classical teaching situation. A single teacher educates around 30 students at the same time. The problem is, that this workflow doesn't scale very good. If the aim is to teach many million of people some kind of technology is needed which works better. That is for example a chinese online-course on a smartphone app, which reduces the costs drastically. The introduction lecture which explains the origin of the Chinese language and which contains basic vocabulary for a hello world sentence needs only one times recorded by the best teacher available and can be played back hundred million of times without further costs. Such language learning technology is currently not used in Africa, and yes, this is a critics.
Student as a customer
Education has nothing to do with Chinese superiority and shouldn't be organized from topdown. The better approach is bring the african student in the position of a customer and see Chinese teaching as a business. The aim is to increase the market with discount prices for the masses, so that every African student can buy such a course.

May 25, 2018

Student-as-a-customer, the redefinition in education


In the modern debate around Open Access a former discussion is often no longer seen as valid, the idea to see a student as a customer. But in reality, both ideologies are strongly connected, because they have something to do with productivity. But let us telling the story from the beginning.
At first, we have to back in time in which the world was well understand. The education before the internet age and before the upraising of the 1980s with his business yuppies can be seen as small and easy to understand. There are traditional institutions, teachers and students. Education in that area was something for the elite, that means, a very small percentage of one year was able to attend higher education. Why? It was mostly the result of the costs for education which were high. Perhaps a small example. If the only technology available to copy a book is a handwritten copy, then the price of the book is high. If television is not available, to only way to attend a school is travel physical to the location. Until the 1970s, higher education was seen as a longterm investment, that means the costs were so high, that 99% of the people couldn't pay the bill and that was the reason why they don't attend a school.
Now, let us describe some technologies which were used in the time after the 1970s to reduce the costs of distributing knowledge. At first, the classical printing press becomer cheaper, the production of books costs less money, with the internet the costs of coping a book were nearly zero. Secondly, it become possible to store and playback lectures from Stanford and M.i:T. and third, the number of books and papers which are available is higher. All these combined together changed the price for higher education. It become cheaper. It is no longer necessary to see education as a longterm investment, the student can decide on the fly, if he want's to watch a video from Stanford, M.I.T. or Harvard. He is no longer forced to travel physical to the location, all what he needs to do is type in the URL into his webbrowser, and he can bring the teacher to the monitor.
If the costs for education became lower, it is possible to change the definition of what education is. In the more modern form, the student can see themself as a customer. Getting access to higher education have to be planned like getting access to a movie. A little bit of planning is always neccessary, because at first the student needs an end-device for example a Linux notebook or an ipad. And he needs an Internet flatrate and so on. But the amount of time which has to be invested until the personal experiance with higher education can start is relatively small. It is smaller then in the 1970s. If an individual in that area was planning to attend a course in Harvard, he would have to be prepared very much. And for 99% of the people it was impossible to visit Harvard ever in their lifetime, because it was too far away, and the costs were too high.
Now I want to go back to the Open Access movement which was cited in the introduction. Open Access means basically to reduce the costs for the consumer further. Instead of paying a paper for 20 US$ each (which is a low price compared to an interlibrary loan of the 1970s) the idea is to deliver a current science paper for free. The only price which a student has to pay, to get access to Google Scholar is the computerhardware he use, the internet-connection and the power provider. Open Access delivers two things: at first it is a technology revolution and secondly it fulfills the idea of “Student as a customer”.
Often, this mentality was described as a political ideology which is called consumeration and market-driven economy. But in reality it is not possible to start this movement without a technology innovation before. For example, if Internet technology is not available it is not possible to define the role of a student as a customer. That means, if the costs for each student for get access to higher education is extreme high, it is not possible for most people to see education like shopping. Even if somebody is explaining to ordinary man, that he should buy a course at harvard he will not following his advice, because he has not 100k US$ for the fee. Only after the pricetag was reduced downto to 0.99 US$ for downloading the last lecture of an Harvard course, it become possible that the student see himself as a customer, and attending the course is like watching a clip in the itunes store.
A look backwards to the intention of the gutenberg printing press shows us, that the development was not completely new. In the 15th the technical innovation of the automatic printing changed the society. At first, gutenberg came up with a special machine and sometimes later his books were distributed widely and transformed the society into Age of Enlightenment. This was done with a nonreligious act, the selling of bible. Gutenberg was a businessman, he used his machine for printing and selling books. The gutenberg press forced the other side into the role of an customer. What Gutenberg has done was simply, to sell out the christian religion. He took the old testament, copied the work and sold it on the market. Perhaps in time of Gutenberg this was recognized as heretic.

The Koenig&Bauer printing press of 1811


Most people are aware of the invention of Gutenberg from the year 1500 which was the first machine who was able to print books. A major improvement of the original Gutenberg press was made in the year 1811 by Koenig&Bauer. They have not invented the book-printing itself, but they have invented the lowcost newspaper printing. The former Gutenberg printing machines worked manual, that means there was no motor to drive the machine. The Konig&Bauer machine of 1811 used a steam engine. The first what they have printed was not the bible (a book) but the Times which was a newspaper. The difference is, that a newspaper is sold for a lower price and on cheap paper.
Limitations
In the history of newspapers, the Koenig&Bauer machine was a milestone. But the device had some major bottlenecks which can't be bypassed because of the printing technology itself. The first one is, that even with modern developments in printing machinery the costs for a single copy are relatively high. That means it is physical not possible to print an issue of a newspaper for 0.10 US$. In any cases, paper and ink is needed. A second bottleneck is, that after printing out the newspaper it has to be distributed to the customer. Mostly with cars and sometimes with bicycle. The area in which a newspaper can circulated is locally restricted. Perhaps 100 miles around the printing house are possible, not more.
Everybody who is familiar with media technology knows how to overcome the problems. The technology is called “The internet” and was invented in the early 1990s. From the perspective of media technology the history is very compact:
1500: gutenberg invents the printing press
1811: Koenig&Bauer invents the steam driven printing press
1990: Bill Gates invents the internet

The Gutenberg printing press and beyond

The invention of the gutenberg printing press took place in 15th century. From a technical point of view, the gutenberg press was a machine, and was later improved to the fast-press by Koenig&Bauer. The economical impact was, that the bourgeoisie could uprise. That were doctors, lawyers and scholars who were dependent from knowledge and used his knowledge to earn money with it.
Most of this is history. It was a development from 1500-2000. But what comes after the gutenberg press? The answer was given by Xerox company in the 1970s. They called their invention the digital distribution of knowledge. Today it is called Open Access and means, to distribute pdf papers over the internet. This invention took place around the year 2000 and even today only a minority is familiar with it. The result will be as dramatically like the gutenberg press. LIke in former times, the invention contains on one hand of the technical idea. It is mostly the pdf format, internet routers and computers for showing the information on the screen. The more exciting development will be the economical impact.
Let us go first back to the gutenberg age. The uprising of the bourgeoisie was possible because with the gutenberg press the reproduction costs of books were lower. Over the years, the price decreased to today 30 US$ for a single book and around 10000 US$ per year for an academic journal. Compared to the price which was paid before the year 1500, this is very low. The bourgeoisie and their jobs were only possible because of this low price. The outlook of the electronic open access movement is to reduce the price of information further. No 30 US$ per book but 0 US$ per book will be the new price tag. What are the effects for the bourgeoisie? The simple answer is, that former institutions like libraries, universities, book publishers and a caste system which contains lawyers, doctors and so forth is no longer needed. Who this will transform society into a new status is unclear, but perhaps the development will be as strong like the upraising of the bourgeoisie in the 15th century.
What we can say about the past is, that the bourgeoisie was mainly grouped around the gutenberg press. The advantage of that machine was, that it provides knowledge for a lower price, the disadvantage was that the price was not zero. That means, in the gutenberg age, institutions like universities, libraries and publishers were needed. There was no option to bypass the system. In the post-gutenberg age (which is called the internet) it is possible to be no longer dependent from printed books.
Today's society is mostly organized around printed books. They were printed mechanical and around printed books a high-price education system was established which is visible in schools, universities and careers in that domain. The gutenberg-society is the result of a certain technology, the gutenberg press. If this machine is used actively and for doing business, it make sense to see the university as the most important structure in society. If a new better technology is available, like the internet, the old system which contains of the gutenberg press and their economical impact will lost his power. If printed books are no longer needed, a university is no longer needed. And if a university is not important, the university need no taxpayer money and so forth.
This is well known from the upraising of the bourgeoisie in the 15th century. The first thing what they have asked was if the church is perhaps no longer important. And they were right. If the world is controlled by normal people who are working as a doctor, lawyer and teacher the role of the church is in the minority. Today, in the internet age we can ask for a similar topic. If it is possible to distribute papers over the internet for free, perhaps then we need no longer a classical library?
The answer is open right now. Like i mentioned in the introduction, the Open Access movement is very young, and most people are unsure, if this development is right. But what we can see is, that the development is mostly driven by economical question. What costs a book, what costs an electronic document, and how does this change the world.
Between the gutenberg age and the internet age is a huge similarity visible. Not on a technical level, the gutenberg press works very different from an internet backbone router. But on the economical factor. The main idea behind printing press is to reduce the costs per page. Modern machines can print faster and for lower costs then the machines before. .The same goal is visible in Open Access movement. The idea is to reduce the costs of reading a paper and writing a paper to zero. This is not easy. Because a paper has to be created first, and this takes time. So it is not possible to reduce the costs really to zero, because there is always need for an author and until the author can write a paper he needs a lot of knowledge. So the question is how to train new authors, and how to motivate them to write papers. But the general direction is to reduce the costs. And this works surprisingly well.
How we can describe the past of education? The best idea is, to go back into a time in which the internet was no available and printed books were the only medium for distribution of knowledge. In that imaginary world, all bourgeoisie institutions making sense. That means, in the pre-internet age it was not possible to reduce the costs of a newspaper under a certain level. Because on the hardware side the paper has to be printed. And if 1 mio people want to read a book, which costs each 30 US$, than the overall costs are 30 mio US$. So it make sense, to use libraries for sharing the books and to use centralized universities in which people are teaching who are familiar with knowledge. What does that mean? If books are expensive, education is expensive. And if education is expensive not all people can be educated. This results into a society structure of specialists. One person is only an expert for law question, the next has knowledge about medicine and the third person has no knowledge because he is poor. In the pre-internet age there was no option to change the situation, because the costs for printing books are fixed.
If we are taking the internet out of the equation the time from 1500-2000 makes sense, and the society structure of universities, experts and 90% non-educated people makes sense. It some kind of normal development. That means with the technology of 1800 it was not possible to distribute knowledge to everybody. And now it become clear, what will happen if we are putting the internet into the equation. The first was the internet will change is, that the plot of the gutenberg age makes no longer sense. If the costs of knowledge is zero everybody can get knowledge, there is no need to focus only on medicine, one person can be an expert for all subjects from math, biology up to law.
Again, who this future will look like is unclear, what we can say for sure is only how the past look in which the internet wasn't available. It was the classical gutenberg age from 1500-2000 in which printed books were the dominant form of knowledge distribution.

Misunderstanding in information filtering


A while ago in an online newsforum was asked the question, if it is ok to publish a paper in a so called “predatory publishing journal”, https://academia.stackexchange.com/questions/110029/should-i-submit-my-paper-to-a-journal-that-accepts-papers-quickly Most, even all answer go in the direction to not recommend so, either because of detail question or because a fast and cheap publication is not recommended for future science. But what is the story behind this argument? Why it is not recommended to publish a paper? The answer has to do with a potential information overload. If too much (low quality) content is in the wild, it is harder to identify the important information. And not publish too much, especially not content from low quality, is the answer of how to solve the filtering problem.
But, the problem of information overload and too much content is a typical problem of the pre-internet age. Nowadays there is a technical solution for it, called “Academic search engine”. I want to give a concrete example of how to using such a tool. Suppose, the idea is to find useful information from all the papers available. The first thing, what the reader can do is to reduce the year to the year 2018. This filters out around 90% of the content. The second what he can do is to type in a keyword in the searchbox, and last but not least he can sort the entries in the number of citations. This results in nearly all cases in a very compact list of potential papers. That means, with a bit understanding of how a search engine works, it is possible to detect even in huge amount of information the useful content. So it is no longer useful to block content on the publisher side. And that is why predatory publishing is right. The idea is, to publish first and ask later about the quality. The publisher do not know what his readers need, and especially he is not aware what his readers will need in 5 years. Except from serious mistakes like plagiarism every paper should be published.
The funny thing is, that in the above cited online-thread the background assumption of information retrieval is not discussed. Instead most answers are assuming, that information overload is a major problem and the shared identity is, that no search engine is available to solve the problem. Not the answers are wrong, but the precondition under which they are formulated are outdated.

Why the Brockhaus encyclopedia is death


Some people are arguing, that Brockhaus is death because it was a printed book which is no longer relevant in the internet age. But that was never a problem. The printed version of the Brockhaus is superior to the digital version. Because a normal computer monitor has only a low resolution display, but the printed book has 600 dpi for the images, and taking real paper in the hand is a great feeling. The reason why the Brockhaus is death has to do with it's high price. The last printed version was sold for around 3000 US$. Who want's to pay such amount only for taking a look into it? Libraries are the only institutions who can pay such amount of money. Normal will not.
Now let us compare the price for the Brockhaus with the price for the Nature journal. If somebody want's to buy the complete collection of Nature over the last 100 years he must pay huge amount of money. He get a real value, but how has such amount of money? Right, libraries are again the only institutions who can pay the price for all the scientific journals. Without librarires all the high-quality journals from Elsevier and Springer will be death too. Nobody, except from a billionaire, has enough money to get his own academic library. For example, if the idea is to only read 20 journals, the price will be higher then a sportscar.
That means, there is no real market for selling high-price academic journals. And no end-user is a customer of the Nature magazine. The reason, why this magazine can be sold, has to do with the current structure, namely the academic library complex, and tax-payer founded public education. To make the point clear: if the library no longer pay the money for Springer and Elsevier, both companies go to bankrupt.
Let us compare Elsevier with the Brockhaus case. Have they are ignored the internet age? No, Brockhaus had a great internet website and Elsevier too. What they have ignored is to reducing their costs. Brockhaus never reduced the price for his books and Elsevier had made the same decision. In an environment in which the price for a product is important, both companies are in trouble.
Now we can explain better, what predatory publishing is. If we are locating predatory publishers by countries, it is surpringly that nearly all publishers are settled in the BRICS countries. That are countries who are price-sensitive. They have not enough money. The main idea behind predatory publishing is simply to reduce the price for academic publishing.
Under the assumption that the outcome for Academic publishing is the same like the market for encyclopedia, the price is the most important factor. Only an organisation / company how is able to reduce the price of their product downto zero will survive. Wikipedia has won the battle with Brockhaus, because Wikipedia is for free. On the quality side, Wikipedia has surprisingly bad content. Especially in the beginning, the number of article were low and were written by amateurs. But on the long run this was never a problem. The customers have ignored the quality problems, and the detail problems were fixed in later iterations. My prediction is, that we will see for predatory publishers the same effect. Today predatory publishers have a very low quality, sometimes no peer-review is done and the quality of the submission is below the average. At the same time, the APC costs for publishing a paper there is excellent, that means it is lower then publishing a paper in a classical journal and on the long run this will make the success.
Price list
Without any doubt, Elsevier produces high-quality serious academic journals. On the quality scale they are superior to so called predatory publishers, and have the highest standard available. The papers in the journals are often break-through research, are formatted perfectly and in 50 years never contained any spelling mistakes.
But, the Elsevier pricelist is very high. The average journal costs around 1000 US$ per year in subscription cost, and if an author want's to submit a paper additional Article processing charge of around 3000 US$ per manuscript are charged. Only wealthy library and universities with billion of US$ of income are able to pay the price. The consequence is, that Elsevier is no option for universities with a small budget, online-only universities, or for private citizen who are interested in science in general. What Elsevier is doing is the same business model like the Brockhaus has done: they are selling their product only to university libraries because they are historical strong connected and the library is paying every price. The trend for the future is positive. That means, that Elsevier will increase their quality further, which results into better papers and higher prices. Perhaps, they are also increase the paper quality with golden-ink and use a leather cover. For example, the former Brockhaus publisher had also a luxury edition in sheepskin. The price was not 3000 US$ but 7000 US$.
Market dominance
It makes no sense to criticize Elsevier for it's role they are playing. Elsevier is according to the self-definition a high-quality publisher and the market leader. It is not up to them to change their status. It is up to the library to redefine the situation. They can for example decide to no longer buy Elsevier journals and they can also decide not to pay the APC charge for the author who want's to publish there. Not Elsevier, but the library has the obligation to come forward.

May 18, 2018

Insights about scientific productivity


The good news is, that some information are available about the productivity of researchers. The topic is discussed under the term bibliographic studies and means basically to analyze a bibtex database with Matlab for getting information like: “How many papers has a certain author published?”, “How many papers are published per year on average?”. The surprising information is, that such studies are coming all to the same conclusion. The mean-productivity in the now is around 2 papers per year, while 100 years ago, it was around 0.5 papers per year.
Additionally it is possible to speculate about the reasons behind. In most cases a productivity of 2 papers per year is the result of co-authorship. That means, it is measured because 200 authors are in the database who have written in one year 100 papers together. And with a bit number crunching this results into a value of 2 papers per year and author. The value is hypothetically, because in real life no paper was written only by one person. The average bibliographic entry on Google Scholar contains at least 3 authors, sometimes more. The reason why different authors are working together is because they are working together for fastening up science. A single author would need 3 years to write and reformat a paper, while 3 authors are need only 1 year. It is the same principle well known from assembly line factoring in the automative sector, in which a task is split over many persons. Or perhaps a more realistic similar example is the movie production. A film is usually produced by more then 100 people because otherwise a Robinson Crusoe like author would take hundreds of year until the product is ready.
What is not given in the studies, but what is very interesting, is a correlation between productivity and media technology. For example, I would guess that using a computersoftware for typing the manuscript is faster, then using a simple mechanical typewriter. Literally it is unknown, how a modern researcher is typing in his manuscript, of if he is using a classical library or online-only databases. The only comparison which is given is called Wikipedia. Here are some studies available about the writing technology the authors are using. And perhaps academic writing for a journal works with similar principle. The name Wikipedia was mentioned now, it is perhaps the best researched community available. Even all the authors are anonymous per default, WIkipedia is investigated heavily for questions about article count per author, number of edits, used tools and so on. From a content level, WIkipedia is not so advanced like a scientific journal, but there are more studies available which are reflecting about the workflow itself. Wikipedia can be called transparent as default and perhaps it is the forefront of the Open Access movement.
What is normal in Wikipedia today (get the number of articles somebody has written with a simple SQL querry) will become normal in the academic community in future too. Today such features are missing, it is not possible to ask Google Scholar what the productivity was in the last year, the rawdata are stored in the database, but there is not query for getting the results. And it is also very uncommon to ask for these information in online-forums.
In http://www.csiic.ca/PDF/NSERC.pdf is on page 28 a nice figure given, which shows the paper production per year. Animal biology for example has a value 2.35 upto 4.80 papers per year.
I think the most dominant reason, why it is unusual to talk about this values is because historically academic publishing was done outside the internet. If a printed journal is normal media, it is not possible from a technical point of view, to measure the productivity. In contrast it is very easy for a simple online-forum which is stored in a SQL-database to analye the number of postings and the amount of Byte every user has uploaded, so for online-only media it is quite normal to get such statistics. The same is true for Wikipedia. Wikipedia was founded as an online-only publication medium, so from day 1 all kind of useful statistics were available and they were discussed in detail. Or to explain the situation from the other way around. If no machine readable database of all scientific publications is available it is not possible to ask how many papers were published in general. If the scientific community has only printed journals and a paper card-catalog it not possible to count exactly how many publications are new in the last year and how many authors have worked together for creating the content. No talking about the scientific productivity is sign, that for a pre-online publication system in which basic information like a database of bibliometric data isn't available.
Sure, in theory it is possible to visit the Library of congress and count in their catalog how much papers get published per year and note down every authorname who was active. But in reality, such a research project will fail, because the amount of information is too high. That means, it is not possible to run a SQL query against a classical card catalog.
Did classical journals invented for bibliometrics?
The assumption so far was, that classical academic publishing has to adapt their behavior into a modern form of Open Science. That means, that researchers who have published in printed journals must accept, that their productivity is measured because the internet is a must have. But what is, if the other side is right and that measuring the details was never the intention? Let us go back in the area in which classical printed journals were founded. In the 1970s a printed academic journal was the only option researchers had to publish their paper. If they had submitted in that area a manuscript they had a certain assumption about the working of a journal. That means, they have submitted the content not with the intention to distribute it worldwide in electronic form, but the idea was, that a small amount of other researchers get access to it.
Nowadays, search engines like Google Scholar and Jstor have indexed the old journals, created machine readable bibliographies and distribute the fulltext worldwide without any costs. But, that was not the deal. At least not in the 1970's in which the authors has published their manuscript. What we see today is, that the old printed journals are used for a different purpose they were created. Perhaps this is the wrong way?
Let us imagine what the alternative is. The alternative is a new academic journal, for example PLOS one or Wikiversity Journal. Both are created as an internet only medium as default. If an author submits a manuscript there he is aware, that the content gets published online, and that other researchers will analyze the productivity in detail. The hypothesis is, that with the old traditional printed journals nothing is wrong. It is ok, if the fulltext isn't available in the internet, it is valid, if the productivity of the authors never gets evaluated. Perhaps we need new journals, which are created from the beginning for the web?
The famous example WIkipedia was mentioned earlier. From the beginning for every author it was clear, that the data are stored in an SQL database, that 7 billion people can read it, and that the productivity of any author gets measured. That was the deal, and any Wikipedia author knows it. If they are submitting an article or an update, they are aware about the consequences. From the beginning, WIkipedia was established as an Internet only project, it is accepted to send an sql query against the database to get details information about the authors.

It is possible to transform classical journal with Open Access?


The hypothesis is in most cases yes, and most discussion is about the question how exactly, Springer, Wiley and the other publishing company can become available in the internet. But perhaps they were never invented for electronic publication? Let us make a simple example. Suppose, a classical printed journal was established in the 1970. This journal has a certain amount of readers (mostly libraries from all over the world), it has a certain authorship (professors from universities) and it has a business model which works. There is no need, that this newspaper is changing anything.
Sure, from a technical point of view it is possible to scan in all the old issues, it is possible to catalogize the articles in a bibtex database and even more, it is possible to run a SQL query against the database to get information about the productivity of the authors. But i would assume, that none of these idea will help the journal. In most cases, the journal is well working and the authors need no higher productivity.
Let us examine such of the techniques which are usually be discussed in the context of Open Science. The main idea is, to make academic writing transparent, to make the fulltext available for free, to open the science process for non-scientists and to increase the number of articles. To be honest: all of these goals contradicts the idea of a journal, founded in the 1970s. It is something which is different, and it is not possible to transform a journal into a new one.
The lesson is well known from the Brockhaus encyclopedia. Suppose, in the 1990s, one visionary thinker inside the Brockhaus encyclopedia would argue, that the company has to change everything which was done the last 100 years. Suppose, the employee argues, that Brockhaus become digital, that the work is done by anonymous authors distributed over the internet and all the content has to become free to read. Make it sense to realize such ambitious goals within the Brockhaus? No, the communcation friction would became to high. That means, explaining the old readers and members of the Brockhaus universe that everything what they have is wrong, and that the must learn everything new will not work. The better approach is to start from scratch outside of the old brockhaus, and this project is called wikipedia.
In my opinion the situation in the Open Access world is the same. It makes no sense to explain to classical publishers that they have to switch to online only, it is not the right choice, to explain classical authors, that nowadays their productivity is measured and it makes no sense to explain to the libraries, that it is no longer relevant to pay money for printed journals. In most cases, such arguments will not convince the other side, they are contradicting everything what the other is believing in.
The better approach is to start from scratch. This reduces communication friction. If a new academic journal is founded from scratch which is online-available for free from day 1, no one has to be convinced that this is a good idea. That means, the journal has no print history which is 50 years old, but starts with fresh ideas from the beginning.
I do not thing that OpenAccess and OpenScience is a development which is valid for every publisher. It is a criteria to separate between old publishing and new publishing. Both can exists together. The best perspective is, to let fans of printed journals and classical payed model are doing, what they have done the last 100 years. That means, to let the business of Elsevier and Springer unchanged, and at the same time start new academic journal for online-only and Open Science only purposes with a different approach. In such new invented journals the general attitude is internet oriented, the peer review is done openly, and productivity of the authors gets measured and modern technology is used as default. I do not see, that Open Access will replace the old model, I see only that there is a communication friction between people who like Open Access and people who don't.
The answer is not to negotiate about a shared future, the answer to work against each other. Like in the past, it is not possible to run Wikipedia from within the Brockhaus umbrella, because Brockhaus is to small for that plan. It is necessary to make a cut and start from scratch with a new company / organization.
Opposites
The main feature of Open Access is, that it is completely different from what's done the years before. Old school academic publishing works slow, in printed journals, with a small amount of authors, with a support by universities and for high prices. While the Open Access / Open Science is radical different. It is cheap, works with Internet technology, is open to everyone, and needs no library. The differences are too huge, to speak about a shared future. That means, there are more then detail question it is something which doesn't work together.
Open Access and classical publishing have not much in common. And it is not possible to argue pro Open Access to convince old publishers to change their mindset. It will not work, they will change nothing. The logical consequence is to give up. That means, it makes no sense to measure the productivity of the authors or to convince the people that using a mechanical typerwriter is out of date. The better alternative is to give up. That means, to let them working how they want and do not argue against printed journals.

May 17, 2018

Touch typing tutorial for Linux



Welcome to the introduction in touch typing. The first what you need is a fresh installation of Fedora Linux. It can be downloaded here https://getfedora.org The correct ISO file which can be downloaded for free is “Fedora Workstation 28, 64bit, 1.7 GB, live image”. After installing the software (which takes around 5 hours for Linux experts, a typing trainer can be installed with:
dnf install klavaro
Until a newbee gets familiar with the software it can take around a year. The good news is, that it is not necessary to run the course until end. Complicated keys like numbers 0,1,2,3 are not needed to type also with touch typing. In reality, it is enough to learn the basic 26 characters from A..Z in small and big caps and the space bar, of course.
After the user is familar with touch typing with 10 fingers, the next challenge is to become familiar with the Lyx software. It can be installed with “dnf install lyx”. After he is an expert in this tool, he can try to improve his English skills. The best way in doing so, is to write academic papers in English and use a bilingual dictionary for unknown words.
Other typing tutorials
The Klavaro software is not very advanced, for example a measurement of the typing speed is not available. Other programs like Ktouch and online tutorials which runs in a browser window have such feature, but they are more complicated to use. The main problem with all typing trainers is, that using them is a repetitive boring task. The progress of the learner is low and in the worst case, he gets serious injured in his hand because he types too fast.
My advice is to use the most simple typewriter software, which is Klavaro, type in the examples in the slowest possible time and make many breaks between the training. IF something feels wrong in the hand, stop the training and reduce the speed further. In most cases a writing speed of 100 chars per minute is enough. If somebody is able to type in a essay in that speed blind he has finished after 20 minutes (2000 words), which is very fast compared to long tail handwriting.
How can somebody typing large amount of text without using touch typing, without using Lyx, without using the English language, without using Fedora Linux and without using Google Scholar? Right, there is no way. If somebody is not familiar with these skills he is not able to write anything or get knowledge about any subject. I would call these capabilities the basic skills every student should master, otherwise he is not a student but a gamer ... That means, if somebody is arguing that touch typing is not important, that mastering the Lyx software is not important and that he never noticed of Google Scholar he can do so, but then is not a real student.
Choosing the right keyboard
I can not recommend any of the ergonomic keyboards offered by Microsoft and Logitech, because the keys are break down after some weeks of typing and the consumer is forced to buy a new model. But, the technology is better than using pen&paper or a mechanical typewriter because both have no usb-connection.
Typing speed
Again, I want to explain how to prevent Injuries in the hand. The most important aspect is to reduce the typing speed. That means, even a newbie can type very fast, for example 200 chars per minutes it is important to artificially slow down the speed to 100 and below. Because, if somebody is typing with a speed of 300 chars per minute for a while he may be fast, but after some minutes his fingers can get be broken. The better approach is to type slow but constant over longer periods. That means, the user should take around one hour for typing in a small text, and if he is done, he will get no physical problems with his fingers.
In reality, the most interesting feature in touch typing isn't the raw speed, but the ability to type in the words blind. Because usually many errors are happening while typing, that means somebody wants to write a word like “Hello” but he is typing “jekko”. The trick is to recognize typing errors early and correct them life, this is only possible if the eyes are focussed on the monitor and the finger stay on the keyboard. In the example word, the user types in the first letter “J”, recognizes that the character is wrong, press' the backspace key and types in the correct “H” key. Even if somebody reaches only a slow speed in touch typing he is able to doing such operations on the fly, which increases the overall speed.

What is realistic with medical robots?


In sci-fi fantasy novels like Star Trek, nanorobots are sometimes presented as a tool for healing people. It is hard for the normal audience to say, if this technology is realistic or not. Will such nanobots ever become realistic, and if so in 10 years, 100 years or 500 years?
Giving an answer is not so complicated as it looks like. At first let us describe something which is in the near future possible, or even today. A step into modern medicine wouldn't start directly with Nanorobots, but with household robots which are helping elderly people. The classical example which is available since many decades is a simple electric wheelchair which allows people to make a ride, even their legs a no longer working. A more advanced form of such a tool would a walking robot, which travels together with the patient. Such a device was shown in the hollywood movie “I, robot (2004)” which is bringing the inhaler to an old lady, but the same sort of technology can be seen in the series “Real humans” in which a household robot is making the lasagna for an older guy. My prediction is, that such helping robots will become commercial available in the next 20 years, perhaps earlier. But from a technical point of view, such technology is realistic and it will improve the situation a little bit.
Nanorobots are a more difficult to evaluate. Right now, there is research ongoing in the direction of DNA Nanorobots. That are machines, not build with metal, but out of DNA and they are tested, if they can repair a body from the inside and measure information. The promise is, that such technology can really improve the medical condition of people and extend the lifespan. But, it is hard to say, if and when a dna nanorobot became realistic. As far as i know, it is mainly a research project without an application. The researcher are working in that direction, because they hope that one day, they will become successful. DNA Nanorobots can be called the most advanced form of robotics, because they are not simply a mechanical tool like an autonomous car or an automatic wheelchair, but the idea is bring medicine forward.
As a summary, we can say that today well known medical devices like an electronic wheelchair is a product, which is available. WIth a horizon of 20 years, walking robot which can help elderly people will become available and beyond this horizon DNA nanorobots are an option for further development.

May 15, 2018

Will a hydraulic printing press destroy the Amish culture?


From the perspective of rejecting technology it is a good idea to not use any device which is using electricity. That means, for Amish it is not allowed to use television. The problem is, that a certain type of technology doesn't violate the Amish way of life, but is at the same time the pure evil. I'm talking about a hydraulic printing press which is used by many amish people. On the first impression everything is fine with the machine. It doesn't need any kind of electricity, and even a battery is not needed to drive the apparatus. But is the machine compatible to the belief in god? I'm in doubt, because a hydraulic printing press can produce many hundreds printed documents per day, and what is written in the documents is against the bible.
The danger is high, that if the printing press is used to often, this can destroy a society, especially if the written words are not the holy book but something else, for example comics, prose and modern journals. The hydraulic powered printing press is the weak point in the amish village. It is important to restrict the access, because the distribution of knowledge is dangerous. The problem with the printed word is, that it can be read alone, and this will isolate the individual from society. He can make his own thoughts and questions if the other people in the village are right. Books are forming a so called gutenberg-galaxis, which is an anti-amish doctrine to manipulate the mind of the younger generation. Every amish people should know how a typewriter and a hydraulic printing press looks like so that he can avoid to use it. Both needs no electricity, but they are dangerous devices. in a direct comparison, an electric driven fridge is harmless, because this machine doesn't isolate the people from the community. A cooler can be used together with other people. But a mechanical printing press has the dedicated goal to destroy the community. As more books are available, as more the people get lonely.
Let me describe the worst case scenario what can happen if technology is used to much. Suppose, the hydraulic printing press is instrumentalized to printing all kinds of pamphlets and church-critical books. They were distributed between the Amish-villages and the people starting to read books in´ their leisure time. From a formal perspective, no rule was broken. Because a horse transport vehicle is allowed and a manual driven press too. But after a while the community will get lost something important. If the people have read alone the books, they will have no need to talk each other. All what they want to know did they read in the so called newspaper. They will get lost their ability to talk to each other in a conversation, and this is equal to lost the faith in each other.
Teaching Amish people in a school is no problem. The time in the school is helpful for forming the character of the youth. But what it is a no go is using self-printed books in the education. Even if the books containing the bible, it is a sign of decadence. That means, a school with books can't be called Amish anymore.

Future of libraries and universities


To make a short: in future both institutions will become obsolete. Sometimes they are called obsolete today, but that is perhaps too pessimistic. But what was happen, that the revolution was so big? To answer this we must go back into the history and define what a library and a university was in former times.
Let us image a time, in which both were useful and an important part of society. This can be dated back into 1980s before the invention of mainstream internet. In that time, the first home computers for example the IBM 286'er were available but they capabilities were limited and it wasn't possible to use them for data interchange. Other mediums for knowledge transportation were important in that area, mainly the book, printed journals and oral presentation of a professor. The library and the university are grouped around these media. A library is a place, in which the medium book is stored, while a university is a building in which professors are speaking to their students.
Before the invention of the internet, the library and the university building were the only place for doing so. They had a monopoly on knowledge distribution. Now let us compare the situation with todays technology. If in the year 2018 somebody want's to get access to academic papers he can ask Google Scholar, if he want's to hear a professor he can watch a video ot OpenCourseware. Like in the 1980's he can also take a normal book from the library and he can also join a real university, but it is no longer a must have. In most cases, it is bad choice in doing so, because libraries are far away, and join the real stanford university is too expensive.
But if universities and libraries will disappear, what is the new modern environment in which knowledge is distributed? Classical institutions are not away, they were replaced by something which works better. That is a online repository for storing paper in digital format, that is an academic search engine, a video plattform for hosting lectures, that is computer hardware which allows the playback (smartphone and notebook) and that are product which is sold by AT&T namely a broadband internet connection. That means, in future money will be spend and time will be invested in bringing Academia forward, but the difference is, that the recipients are different. A modern form of education for example can be described by the availability of fiber optic internet connection. Building such infrastructure costs billion of US$. The idea is, instead spending money for building a university, the money is spend for optical switches and datacenters. Instead of spending 30 US$ per month for buying books the students will spend the same amount for money for the internet connection.
There is no need to discuss university and education at the same time. Instead, higher education is based on media technology. Without a medium it is not possible to transport knowledge. A medium can be: a book, newspaper, video, website, pdf file, human teacher and so on. Some of the media are equal to classical institutions for example the book and a human professor. Others like a video recording of a lecture or a pdf file are modern media which can be created and transported at lower costs. A discussion about the future role of a university in the internet makes no sense. From a technical point of view, it might be interesting for creating a website for a university, but that effort is ignoring the real capabilities of the internet. Using the internet right means to upload fulltext to an online repository and uploading a lecture to youtube. After doing so, the university needs no longer a website, because the internet is the university.
Even today, not minor technical problem makes it difficult for realizing such goal. For example, a broadband internet access is not the standard worldwide, and the amount of content available online is very low. But that a only detail problems, the general pathway is, that with more invested energy these problems can be solved. Classical universities and libraries will not simply disappear, they will be transformed into a museum. That means, a museum is build around the former building to explain future generation what a university was. This museum is not always physical visible, it will at first only be recognized as a changed debate about the role of university in societies. That means, we will see some kind of debate between people who are believing that a university is useful and other who want to describe the institution as obsolete.
The best example for such a debate are the Open Access discussion from the year 2012-2018. In most cases the idea is, how to develop existing universities forward and make them internet ready. The debate can be seen as a defend against declaring the university as obsolete. Most Open Access advocates are motivated to describe the situation optimistic, the opinion is, that universities and libraries are in general the right idea for future higher-education. But Open Access advocates lost the debate, because they are ignoring the current situation. What they are describing is not the reality, they are describing dreams and visions. The reality is, that current libraries are grouped around printed journals, while in the debate the talk is about electronic journals which are not there. Such gap in reality perception can be easily exploited by a counter movement who want's to describe universities and libraries as obsolete and are trying to replace the idea in general.
The basic idea behind a library and a university is simple: it is a place in real life which is equal to a monument around people are grouped. Such a monument makes in the internet world no sense, because the internet is not dependent on physical locations. Physical instances of information are there, for example the Google datacenter or the AT&T cable, but these places are not called universites, they are called datacenter and fiber-optic line.
The result is a misunderstanding. If someone is asking for money to build a library, the other side will not understand him. But if somebody is asking for money to build a fiber-optic connection he will be understand very well. What the Open Access debate is about are words. The idea behind Open Access is to make clear to the public what a library is. The danger is high, it will become forgotten.
Museum
Sometimes, the youth is accused, that they are not interested in books and schools. That is totally wrong. Especially the youth is very interested in printed books and teaching environments in which they must be quite for 45 minutes. The difference is, that they are visiting museums for getting more information about it. Digital natives who are familiar with tablets and internet access are highly motivated to visit a place, in which they can see the Koenig&Bauer press from 1811, or to attend a play in which a classroom has a classical whiteboard and a physical teacher. Taking a look backward is very helpful to understand the world. The only thing what the digital natives are not understand is if somebody is not aware that the past is over, that the things have changed over the years. Watching a gutenberg printing press in a museum might be funny, but using it for printing out a real book is a waste of time.

Reason why English is better


Only a fraction of papers are published in English. Large parts of the EU, China, US and India are using their own local English in University content. For example, in Germany nearly all classes at the university are German-only classes, the same is true for other EU-countries. But why should everybody speak English, if he is great in his own language, and doesn't won't to do something else?
The answer is given in the book “Tucker, R. "Mathematical and scientific library of the late Charles Babbage." CF Hodgson and Son, London (1872).”, it is a very historic book, published 150 years before and contains all the books and journal papers, Charles Babbage was aware of it. Apart from the content itself, which is mostly about Astronomy, Mathematics and Optics, is one thing remarkable: The literature list contains many different languages. There are some books written in French, in German, English, Latin, and sometimes the language is Italian, It seems, that Babbage was familiar with all these dialect and his first profession was not computing but philology. Sure, all of these languages like German, French and so forth are very nice languages, and in theory it is possible to express everything with it. But, are these languages superior to English? No they don't. English is according to many experiments, the easiest language to learn for foreign speakers. A basic vocabulary consists of not more then 1000 words which is enough to travel through an english speaking and get in conversation with the locals. Sure, nobody in the world likes English very much, but from the purpose of using a language as tool it is the best.
The funny thing with English is, that the language is very tolerant with errors. That means, even a beginner makes lots of typing and grammar errors he will understand by a native speaker. In contrast, if somebody speaks French on an expert level, it is very hard to understand him. It is not necessary to give English a warm welcome, because there is no real alternative to it. But what is possible is to make clear who a world will look like which contains of 100 languages and more. The result is, that the people are not understanding each other. If someone is interested in hiding knowledge and encrypting information the only thing what he need is to write down his academic paper in a perfect form of the German language, and around 6 billion people worldwide are no longer able to get access to it.

Publishing academic content as HTML or PDF?


It seems, that both options are possible. Some papers can be downloaded in PDF format, other are in HTML formatted. Sometimes we see the Wiki syntax too. But which of the format is right?
The first idea might be, that HTML is superior to PDF, because PDF is a book format and can't be modified. That is the reason, why Wikipedia is not using PDF but the Wiki-syntax. The question is: does this make sense for research papers too? Let us suppose, that HTML advocates are trying to overcome the PDF format and force the authors to upload HTML content which can be easily shown in any browser. That is equal to a scientific blog which can be edited later and annotated with comments. There is no need for publishing papers anymore, because we have a blog.
But, are academic blogs are able to replace academic papers? The problem with blogs is, that they were not designed for archiving in mind. Instead the idea is to write something down, link to it and after some month the content get lost, because the website is no longer available. It is very complicated to export a complete Blog into a different format. Even the author of a blog has problems in doing so, and archiving an external blog is often not possible.
The reason why PDF is widespread used for academic publication has a good reason. It is the superior fileformat. Even if the file is not printed out, it makes sense to store a paper into such a format. PDF can collect not only the text itself, but also the pictures and the tables. Today, the standard format for academic is PDF or Postscript, that means, any paper of the 50M available papers are provided in the PDF format. I would guess, that this will not change in the future. The only thing what become possible is a new pdf version, for example with better compression or something like this. But replacing pdf with HTML is not possible.
Between an academic blog and an academic paper there is a difference. An academic paper has an indepth focus. That means, it contains lots of literature references and is focused on an expert audience. Above an academic paper there is no higher form in writing. That means, if the pdf format together with 200 bibliographic references is not enough to explain quantum computing and neural networks, then no other format can do so. Instead, a blog is a colloquial form of text. It is often created for entertainment reasons, discuss lighter topics and the invested energy by the author is small. It make sense to use PDF for heavy content, while HTML for lighter content.
The main idea behind PDF is, that the paper gets a unique identification. In the simplest form it is an url, but sometimes a doi and a bibtex entry too. A pdf file is closed system, which is created with a timestamp and then gets archived. It is not a living project or a discussion forum, it is written for eternity.
Some journals like PLOS one have both formats in parallel. The user sees the rendered HTML text on the screen, but he can also download the PDF format. In my opinion, it is enough to only provide the PDF file. If a researcher is not interested in reading a pdf file, he is not interested in the article at all. It is not possible to decrease the barrier and make the content easily accessible. A good academic paper is hard to read.

May 14, 2018

Banning Bitcoin and PGP in the own intranet


Some cryptoanarchist are arguing, that the government is not powerful enough to ban Bitcoin, because any restriction can be bypassed. That is simply not true. Banning bitcoin is as easy as banning filesharing. Everything what the provider has to do is blocking certain ports and search in the remaining traffic for patterns. Good routers have such monitoring capabilities out of the box integrated. How this is handled by real Internet-service-providers is unclear, but at least in local area network, the admin has the obligation for doing so. From a technical point of view, it is a bit problematic to block a complete port-range because sometimes this will effect useful services too, the better idea is to monitor the traffic on an IP base, so that the admin can answer the question, which of the user in last 1 month have used the bitcoin protocol, a bittorrent tracker, pgp encrypted e-mail and so on. Identification of the packets can be done either by the port address or by the content itself, for example a PGP messages starts with a certain header.
What the admin is doing with the collected data depends on the current company policy. If the company wants to be more restrictive it can search for users who have violated some of the rules, and this is a good reason for a detailed talk with the user. The funny thing is, that with network monitoring it is also possible to observe potential bypassing methods, that means if somebody has developed his own protocol number, it can be detected too. The idea is to create a fingerprint for every user and classify their activities according to a security relevance. Again, it is not necessary to block the traffic technical, it is enough to monitor the activities silent, so that for later purpose enough data about a user are available to call him a terrorist.
Here is a tutorial how to detect with Wireshark P2P traffic, https://www.howtogeek.com/107945/how-to-identify-network-abuse-with-wireshark/ Using wireshark for detecting e-mail encryption is also possible. Here https://serverfault.com/questions/693814/gpg-receive-keys-times-out-but-wireshark-confirms-the-http-response-is-receive is an example for recognizing a request to a so called keyserver. That means, if one of the users is asking a remote keyserver for a public key with the intention to encrypt a message this can be detected. What Wireshark can't do is to decrypt the message, because it doesn't know the password. That means, it is not possible to see what the user has written in his message (that is the general idea behind encryption). What is possible to say is, if the user has send an encrypted message and who was the receiver. Because sending encrypted messages can be seen as a terrorist act it is enough to expose the user.

May 13, 2018

Some questions to Marc Raibert


1. I've read your phd thesis from 1977. It is an amazing piece of work. As far as i can see, you've worked about a quantum computers to simulate the universe. How do you come up with the idea of using cellular automatons?
2. According to the latest speech at the tech-crunch summit, the aim of Boston Dynamics is to educate the public like Sebastian Thrun with Udacity. How many lectures have you produced right now?
3. I've heard you're planning a collaboration with Linus Torvalds to bring forward the git ecosystem. Tell me more about this amazing idea.
4. Is it true, that one version of the spotmini is driven by a nuclear reactor with a lethal dose of radiation?
5. Why are you not interested in the subject of computer animation? There would be many similarities to robotics.
6. Many of your lectures at the M.I.T. are publicly available at the OpenCourseWare project. Get you many questions from your students how want's to know how to build robots?


7. Is it possible to use robots from Boston Dynamics for paranormal detection like in the movie “Ghostbusters (1984)”?

May 12, 2018

Using Google Scholar effectively


Google Scholar gives a detailed look into current research papers. The fulltext search engine is simply amazing. The only problem is how to navigate in the huge amount of content. In most cases, there are more papers visible then a single person can read. There is a need for a strategy to browse through the information and find the correct paper and improve the own knowledge about a subject at the same time.
I've tested out some possibilities, e.g. creating mindmaps, but the most effective form of reading Google Scholar papers are textual notes. The screenshot shows an example about a topic of Artificial Intelligence but the strategy works for any other topic as well.

The notes containing in most cases a certain request to Google Scholar, for example “pddl task model”. Such a request produces a selective amount of papers. Sometimes one of the papers is interesting, then i write down the exact url. Or I write down a short note about what is inside in the paper. The main idea is to make search history visible. So it is possible, that i can see, what I've searched 1 day ago, 2 days ago and so on. It is mainly a bookmark like list which shows what I've searched and found in the past.
After a while the list of notes will grow. It is possible to create special categories, for example only notes about Artificial Intelligence, or only notes about the Forth programming language. I've found out that this an effective way to investigate a complex topic which was unknown before. Publishing the notes in a blog or in a pdf paper makes no sense, it is more a help to memorize the subject. For example the written down vocabulary “teleoperation, pddl, action model, keyframes” and so on, can be seen as path through a topic. And yes it is important to memorize them, because this will help to find further papers about the subject.
Google Scholar needs from the human user at least some keywords. If the user is able to give the right keywords in the right combination, Google Scholar will show the best information it has. I would guess, most of the time, I'm figuring out for which keywords I have to ask. At the beginning this is not clear, most keywords are given only in the fulltext. They can be called words which a references by the researchers who are writing about a subject.

Why Lyx is a milestone in modern academic publishing


On the web there is an early screenshot of the Lyx software, https://www.linuxjournal.com/files/linuxjournal.com/linuxjournal/articles/013/1355/1355f3.jpg It shows the version 0.12 from 1999 which looks very strange. The main question the beginner in academic publishing probably has why he needs Lyx if MS-Word works great. To answer the question we must focus on the need of a scientific document.
The average academic paper which gets published at Arxiv and other repositories has some details which are important: citation of external literature, images, tables, hierarchical outline, footnotes. To make clear why these details are important, we can imagine a paper without these features, it will look not very elegant.
The problem is, how to create such features in a pdf document. On a formal level the features have something to do with positioning text on the screen. For example a hierarchical outline results into a table of contents and a big heading over a chapter, while an image is a pixel pattern located in or ontop of the screen. Producing these typographic benefits can be called advanced science. A normal text editor is not able to do so. In the history of book printing many efforts were undertaken to make this possible. For example, if someone is trying to use a simple typerwriter for producing a mathematical formula or a table he will run into problems.
In theory, MS-word can handle all these need. But only in theory. In reality, most papers published at Elsevier and Arxiv are not written in MS-Word. MS-Word is used only for the first draft, then a professional layout program is used for generate the pdf paper. It makes no sense to compare Lyx with MS-Word, the better approach is to compare Lyx with Adobe Framemaker and Indesign. The general question is not “Do we need Lyx?”, the question is: do we need Framemaker?
Let us take a look in current book publishing industry. The software Framemaker and Indesign is used by 99% of typesetters. The only exceptions are self-publisher in which most customers are delivering their pdf paper with MS-Word, but in professional typesetting the better choice is a product of Adobe. Now we can compare Lyx with these tool. Lyx is for free, and has a semantic tagging feature, that means, the formatting is done by the LaTeX backend. In my opinion, Lyx is superior to Adobe Framemaker and other software.
Is Lyx able to handle citations, footnotes, hierarchical outlines, tables and images? Yes, very well. The interesting aspect is, that on one hand the software is very powerful while at the same time the usage is easy. Critics are saying, that with Lyx / LaTeX every document looks the same. But that is not true. The general idea behind Lyx is, that is works like a pure ascii editor. The text can be exported as plaintext, and after it can be used in any other graphic program. That means, even if the author doesn't like the latex backend he can use Lyx for entering structured text, export it to plain text and then paste it in a graphic program. Under Linux, the Scribus software is a hot candidate for making fancy flyers.
Scribus is a bad choice for creating an academic looking paper and it take a very long time until one page is ready, because the user has to manually adjust everything. But if the idea is to positioning text and images freely on the screen, it is a good choice.
Monster
Sometimes, Lyx is called textlayout monster, because in early versions of Linux it was notorious difficult to install. Together with LaTeX, some pdf tools and a vector program the overall amount of discspace Lyx will consume can reach 1000 MB easily. And the number of possibilities the user has is endless. But over the times, Lyx has become more user-friendly. Today the interface looks clean and can be installed with a single command like “dnf install lyx”. I would call the software more easier to use then pure LaTeX, MS-Word or Adobe Framemaker. Everybody who ever tried out the software will never switch back to something different. If somebody is using Lyx and has nevertheless problems with academic writing it has nothing to do with layout questions, but with the content of the text.