May 25, 2018

Misunderstanding in information filtering


A while ago in an online newsforum was asked the question, if it is ok to publish a paper in a so called “predatory publishing journal”, https://academia.stackexchange.com/questions/110029/should-i-submit-my-paper-to-a-journal-that-accepts-papers-quickly Most, even all answer go in the direction to not recommend so, either because of detail question or because a fast and cheap publication is not recommended for future science. But what is the story behind this argument? Why it is not recommended to publish a paper? The answer has to do with a potential information overload. If too much (low quality) content is in the wild, it is harder to identify the important information. And not publish too much, especially not content from low quality, is the answer of how to solve the filtering problem.
But, the problem of information overload and too much content is a typical problem of the pre-internet age. Nowadays there is a technical solution for it, called “Academic search engine”. I want to give a concrete example of how to using such a tool. Suppose, the idea is to find useful information from all the papers available. The first thing, what the reader can do is to reduce the year to the year 2018. This filters out around 90% of the content. The second what he can do is to type in a keyword in the searchbox, and last but not least he can sort the entries in the number of citations. This results in nearly all cases in a very compact list of potential papers. That means, with a bit understanding of how a search engine works, it is possible to detect even in huge amount of information the useful content. So it is no longer useful to block content on the publisher side. And that is why predatory publishing is right. The idea is, to publish first and ask later about the quality. The publisher do not know what his readers need, and especially he is not aware what his readers will need in 5 years. Except from serious mistakes like plagiarism every paper should be published.
The funny thing is, that in the above cited online-thread the background assumption of information retrieval is not discussed. Instead most answers are assuming, that information overload is a major problem and the shared identity is, that no search engine is available to solve the problem. Not the answers are wrong, but the precondition under which they are formulated are outdated.