May 06, 2019

Blog aggregators in science


Creating a science blog is not very complicated. This can be done by any amateur writer who has access to an internet PC. If some content was put on the blog, the Google crawler bot will scan the content and now the world can find the information. But who determines, how much traffic will be directed to the blog?
The good news is, that apart from science blogs, this question was answered in the mainstream internet already. The amount of websites who are explaining who to rank the own blog higher in the google list is endless. The concept can be adapted to science blogs as well. A well known technique is called a blog aggregator. Typical examples from the science community are: The Early Modern Commons, http://researchblogging.org/ and http://cstheory-feed.org/ What these websites have in common is, that they are meta-blogs. They are not containing information but they are collecting URLs to other blogs. In case of “The early modern commons” the topic is about the middle age. The advantage of a blog aggregator over a single authored blog is, that each day lots of news are shown, similar to a large scale newspaper. These article were created by a distributed authorship who have individual blogs. The same is done in the CS-Theory blog, but not for history topics but for mathematics. And the last Blog aggregator on the list, contains a collection of misc science topics from all domains like biology, physics, medicine and computer science.
The basic idea of a blog aggregator is the driving force behind large scale websites like Reddit. Reddit is not a scientific website, but has it's root in mainstream entertainment. Reddit sees themself as an aggregated internet. Similar to a search engine but with handcollected links. In contrast, the CStheory website contains a smaller amount of hand-selected posts. Instead an automatic feed aggregator was used to combine all the RSS feeds. It's not possible to upvote or downvote articles and the amount of interaction of the website is smaller.
Fulltext search engines like Google and Blog aggregators like RB are playing an important role in making existing content visible. The internet contains of two seperate parts. The first one are the information itself, which are the text, images, videos and forum discussions. This kind of data needs a lot of discspace and it is growing fast. The second part of the internet contains of infrastructure to search and catalog the information. This can be done automatically with search engines and semi-automatically with blog aggregators. A blog-aggregators needs less storage capacity but generates more traffic. For example the RB aggregator is visited much often by the users, while a single blog within the network gets only a little amount of traffic.
Blog aggregators are playing the role of a gate keeper. They are the first information point for users who doesn't know which information they need, and then the user can click on one of the links to read the fulltext.
Reddit
The perhaps most famous blog aggregator in the world is reddit. The website provides some unique features which are explaining the high amount of traffic. The first one is, that Reddit is handcrafted. Instead of using the planet software for automatically combine RSS feeds, the users have to post the URLs manually. Secondly, Reddit isn't compiled by a single person but by a group of persons. They are monitoring their activities on the site by upvoting and downvoting the reputation. And third, Reddit has a strong focus on mainstream relevant topics like entertainment, computergames and fashion which makes it attractive for a large audience to participate.
Reddit shouldn't be misinterpreted as a blogging website. A blog can be referenced from Reddit, but the content itself is hosted elsewhere. Reddit should be understood as an alternative to Google. A website which comes close was dmoz.org but with the exception that Reddit is working with timebased URLs.
Possible pitfalls
Around blog aggregator some legal questions are obvious. In the case of a german blog aggregator “planet history” the project was stopped because of copyright issues. http://carta.info/leistungsschutzrecht-der-zweite-streich-steht-im-koalitionsvertrag/ The case was, that in the RSS stream an embedded copyright protected picture was delivered. The same problem is visible for Google News, because the original authors of the content were not asked if the snippet of the information can be shown in the internet.


How the legal discussion ends is unclear, what we can say for sure is, that copyright related issues are handled comfortable if the blog aggregator lists only content which has a Creative commons license. Even if the original content is distributed through the RSS feed to a larger audience it's not a problem because Creative commons means, that the author has made clear that he is interested in distribution over the internet.
In contrast, creating a blog aggregator for copyright protected material, especially if it's located behind paywals can be problematic. The debate of the lawyers goes around the question what the difference is between a normal textlink, a nofollow link, an embedded picture and shared content.
The good news is, that Google has in the advanced section a filter to reduce the hits only to creative commons licenses. It's a good idea to prefer these content in the blog aggregator. According to the creative commons license, it's allowed not only to link to such material, but to copy the entire fulltext, aggregate it to a larger corpus and even replay the information in a meeting.