May 18, 2018

Insights about scientific productivity


The good news is, that some information are available about the productivity of researchers. The topic is discussed under the term bibliographic studies and means basically to analyze a bibtex database with Matlab for getting information like: “How many papers has a certain author published?”, “How many papers are published per year on average?”. The surprising information is, that such studies are coming all to the same conclusion. The mean-productivity in the now is around 2 papers per year, while 100 years ago, it was around 0.5 papers per year.
Additionally it is possible to speculate about the reasons behind. In most cases a productivity of 2 papers per year is the result of co-authorship. That means, it is measured because 200 authors are in the database who have written in one year 100 papers together. And with a bit number crunching this results into a value of 2 papers per year and author. The value is hypothetically, because in real life no paper was written only by one person. The average bibliographic entry on Google Scholar contains at least 3 authors, sometimes more. The reason why different authors are working together is because they are working together for fastening up science. A single author would need 3 years to write and reformat a paper, while 3 authors are need only 1 year. It is the same principle well known from assembly line factoring in the automative sector, in which a task is split over many persons. Or perhaps a more realistic similar example is the movie production. A film is usually produced by more then 100 people because otherwise a Robinson Crusoe like author would take hundreds of year until the product is ready.
What is not given in the studies, but what is very interesting, is a correlation between productivity and media technology. For example, I would guess that using a computersoftware for typing the manuscript is faster, then using a simple mechanical typewriter. Literally it is unknown, how a modern researcher is typing in his manuscript, of if he is using a classical library or online-only databases. The only comparison which is given is called Wikipedia. Here are some studies available about the writing technology the authors are using. And perhaps academic writing for a journal works with similar principle. The name Wikipedia was mentioned now, it is perhaps the best researched community available. Even all the authors are anonymous per default, WIkipedia is investigated heavily for questions about article count per author, number of edits, used tools and so on. From a content level, WIkipedia is not so advanced like a scientific journal, but there are more studies available which are reflecting about the workflow itself. Wikipedia can be called transparent as default and perhaps it is the forefront of the Open Access movement.
What is normal in Wikipedia today (get the number of articles somebody has written with a simple SQL querry) will become normal in the academic community in future too. Today such features are missing, it is not possible to ask Google Scholar what the productivity was in the last year, the rawdata are stored in the database, but there is not query for getting the results. And it is also very uncommon to ask for these information in online-forums.
In http://www.csiic.ca/PDF/NSERC.pdf is on page 28 a nice figure given, which shows the paper production per year. Animal biology for example has a value 2.35 upto 4.80 papers per year.
I think the most dominant reason, why it is unusual to talk about this values is because historically academic publishing was done outside the internet. If a printed journal is normal media, it is not possible from a technical point of view, to measure the productivity. In contrast it is very easy for a simple online-forum which is stored in a SQL-database to analye the number of postings and the amount of Byte every user has uploaded, so for online-only media it is quite normal to get such statistics. The same is true for Wikipedia. Wikipedia was founded as an online-only publication medium, so from day 1 all kind of useful statistics were available and they were discussed in detail. Or to explain the situation from the other way around. If no machine readable database of all scientific publications is available it is not possible to ask how many papers were published in general. If the scientific community has only printed journals and a paper card-catalog it not possible to count exactly how many publications are new in the last year and how many authors have worked together for creating the content. No talking about the scientific productivity is sign, that for a pre-online publication system in which basic information like a database of bibliometric data isn't available.
Sure, in theory it is possible to visit the Library of congress and count in their catalog how much papers get published per year and note down every authorname who was active. But in reality, such a research project will fail, because the amount of information is too high. That means, it is not possible to run a SQL query against a classical card catalog.
Did classical journals invented for bibliometrics?
The assumption so far was, that classical academic publishing has to adapt their behavior into a modern form of Open Science. That means, that researchers who have published in printed journals must accept, that their productivity is measured because the internet is a must have. But what is, if the other side is right and that measuring the details was never the intention? Let us go back in the area in which classical printed journals were founded. In the 1970s a printed academic journal was the only option researchers had to publish their paper. If they had submitted in that area a manuscript they had a certain assumption about the working of a journal. That means, they have submitted the content not with the intention to distribute it worldwide in electronic form, but the idea was, that a small amount of other researchers get access to it.
Nowadays, search engines like Google Scholar and Jstor have indexed the old journals, created machine readable bibliographies and distribute the fulltext worldwide without any costs. But, that was not the deal. At least not in the 1970's in which the authors has published their manuscript. What we see today is, that the old printed journals are used for a different purpose they were created. Perhaps this is the wrong way?
Let us imagine what the alternative is. The alternative is a new academic journal, for example PLOS one or Wikiversity Journal. Both are created as an internet only medium as default. If an author submits a manuscript there he is aware, that the content gets published online, and that other researchers will analyze the productivity in detail. The hypothesis is, that with the old traditional printed journals nothing is wrong. It is ok, if the fulltext isn't available in the internet, it is valid, if the productivity of the authors never gets evaluated. Perhaps we need new journals, which are created from the beginning for the web?
The famous example WIkipedia was mentioned earlier. From the beginning for every author it was clear, that the data are stored in an SQL database, that 7 billion people can read it, and that the productivity of any author gets measured. That was the deal, and any Wikipedia author knows it. If they are submitting an article or an update, they are aware about the consequences. From the beginning, WIkipedia was established as an Internet only project, it is accepted to send an sql query against the database to get details information about the authors.