Robotics and Artificial Intelligence: The planet/pluto news aggregator

The amount of published books about Facebook is endless. They have all a common problem, they are explaining the Website from the user perspective. The user is told to create an account, to post something to a group, to update his profile and so on. The underlying assumption is, that the user is unsure in using the website the right way and doesn't understand why he needs to create an account first or was an admin of group is doing.

The more elaborated way for teaching Facebook is to explain social networks from the other perspective. The question is how to build a social network website which allows other user to interact with the website. The amount of tutorials about this direction is smaller, but some of them are available. A good entry point into the topic “Facebook from the backend” is the planet software https://en.wikipedia.org/wiki/Planet_(software) The Wikipedia article is very short and needs much improvement but it explains the general idea. He mentions also some other software projects like Pluto and Venus. Let us take a deeper look into the Pluto project.

According to the github page, Pluto is a:

“pluto gems - planet (static) site generator - auto-build web pages from published web feeds”

This explanation is similar to the wikiipedia page short but it make sense. The interesting feature of the documentation around pluto is that some example websites are presented which shows the tool in action:

https://github.com/feedreader/planets

If we click on one of the demonstration websites we will see a website which looks similar to the Hackernews website, which is some kind of Facebook for programmers. If hackernews was realized with Pluto is unclear but at least the optical impression is very similar.

Installing the pluto website generator on the own server is a bit complicated, but for the newbie it is sufficient to read only the documentation to get an impression what the idea is. In contrast to a Facebook tutorial it is not explained how to login into a website, but the documentation for Pluto explains what feed aggregation is from a neurtral perspective, and gives answer who this can be realized with the pluto software.

A typical use case is, that the the input stream consists of 10 RSS feeds and the pluto software is used to produce a Hackernews like website on the fly. The task is a bit complicated, because all the RSS feeds are combined into a single one, and also a comment field is added. The advantage around the pluto software is, that it's not another social network in the internet, but it is a software for building a social network from scratch. From a technical point of view, Pluto is a static website generator. That means, it generates HTML sourecode which is displayed in the browser. Let me give an example:

The input for the pluto software is a playlist of URLs

URL1

URL2

URL3

and pluto is converting the textfile into a html file. This sounds not very advanced, every python programmer or AWK expert can do the same. What makes the task more challenging is not the technical side itself, but the idea in which context such a program is used. Planet like programs are used for creating social networks. And social networks are the most interesting website category in the internet. They are working different from forums or blogs. Their goal is to aggregate existing content for a certain audience.

Let us make a step back and ask which meta-websites are available in the internet. The well known Google search engine is a meta-website. Google is not a simple homepage of a company, but Google is indexing other homepages on a central place. Without Google it's not possible to get notice of a URL. And without the URL the user can't see the content. The technical side of Google like search engine is a large database which is stored in servers.

A social network is working different from a search engine. Social networks are focussed on news, which means on information wish is fresh. This is similar to all the content which was added to Google in the last week. Different from the Google startpage, a social network doesn't provide a clean interface, but it is crowded as default. If the user opens the social network website he sees as default a time line of events and news. He is not asked to enter a keyword, but he sees clickable links and clickable categories. A social network is very similar to what a newspaper is. Or to be more specific it is the table of contents of the newspaper.

The question is, which technology is helpful to build a social network from scratch? The mentioned planet / pluto software is a good starting point but it is not very advanced. The more sophisticated way is to use a wiki-generator. This is a tool which can generate on the fly media-wiki articles from an RSS feed. The details of such a tool are not explored yet. I have posted recently a question to stackoverflow but nobody has answered it https://stackoverflow.com/questions/56060377/automatic-mediawiki-article-generation-with-a-script

Let us take a look to existing project which are realized already, the http://planetruby.herokuapp.com/ is an example website which was realized with the pluto software. What is display is the rendered static website. The user sees on the website categories like “planet ruby”, “planet open source” and “podcast” and is able to click on the links. It is important to know, that the website is not a normal weblog. It is a frontpage for existing weblogs. If the user is pressing on the links he gets directed to the linked content. So the overall project can be understand as some kind of meta-website.

More about the Pluto project

Let us take a look into the manual of the software https://feedreader.github.io/ The script takes an RSS feed as input and produces HTML content as output. This kind of converter is not very common in the internet. HTML itself is equal to the normal standard and used by all websites. But in most cases the HTML websites are produced by the programmers (early days of the internte) or by SQL-to-hTML generators. The wordpress software for example takes the text of the SqL database and is rendering the HTML page. The HTML file is displayed by the browser.

In contrast the input for the pluto project is an RSS feed which comes from an external website. That means, if in the external blog the content was updated, the pluto website will look different. The reason why pluto and the older planet software was programmed is to combine different blogs into one. A typical scenario is, that a frontpage is created for 100 different blogs about a subject. The resulting frontpage is similar to a social network.

On the first look this description sounds a bit unusual, because social network is marketed by Facebook as a place in which users can meet each other. But from a technical perspective a social network is equal to a news aggregator. Websites like Hackernews and planet gnome are minimalist social networks.

Let us make a step back and describe what exactly pluto is doing. A given website A is converted into a new website B. Why? Suppose a HTML page is available in the internet. The HTML file is a tutorial about programming a line follower robot, called “linerobot.html”. Content aggregation means to copy this file into a new file called “announcementlinerobot.html”. The copied version contains the same content. As a result the user has two options he can either read the original file or the copied version.

It is important to know, that the copied version doesn't contains the full information because of legal and technical reasons. Instead the situation is, that in the copy only a URL is stored to the original content:

https://localhost/linerobot.html -> fulltext

https://localhost/announcementlinerobot.html -> text is “https://localhost/linerobot.html”

Basically spoken the pluto software is generating HTML pages which contains URLs links. Again we have to ask for the reason why. Let us answer this on a practical example. The user behavior of real social networks like Google+ is well researched. The main reason why Google+ is interesting for users is because it allows to comment content, to like content and to ban users because they have liked the wrong content. The main idea behind a social network is peer review from a meta-perspective. Peer review is done independently as much as possible. We can explain the advantage on an example.

Suppose a user has discovered a blogpost in the internet and has leaved a comment. The admin of the blog is not motivated to cooperate and cancels the comment. The user has no options, because the admin is in full control of his blog. What the user can do instead is to visit a reader-friendly website like Google+, share the link of the blog and write the comment there. That means, the original blog admin can only control his own blog but is not allowed to modify the comment on the remote website. This gives the reader a higher perspective and he becomes a peer reviewer. That means, is is not the friend of the original author but can judge according to his own needs.

To understand the idea we have to take a look what peer review is in the scholarly publishing world. The idea is, that an author writes a paper, and then he submits the paper to a journal. The journal is accepting or rejecting the paper. This is a natural conflict situation. Peer review means, that both sides are working against each other. The author hates the journal, while the journal beliefs that the author isn't qualified. The reason why a URL of a file is copied into a social network is because to simulate a peer review situation. A social network is equal to a peer review instance which can judge about the content.

In the minimal example a peer review in a social network has the following structure:

URL1 nice article, i ilke it

URL2 no comment

URL3 i don't know, but i send the URL to another user

URL4 not the best content on earch

URL5 what is that?

That means, the peer review results into a plaintext file which contains of an annotated URL list. The advantage is, that this list can be seen by a larger group. That means, another user can comment the peer review. The discussion is held on a metalevel. The original author who has written the article behind URL5 is not in control of the situation. He has to fear the peer reviewer.

The term meta-website was used quite often. Meta means, that the website consists of URLs. It is a place in which links are collected and commented. The comment is not posted to the original website in the comment section, but on the meta website. A meta-website is a readers forum, it is a place in which the users can review existing content. The disadvantage of a meta website is, that it doesn't contains content. Apart from a URL list and some like votes Google+ has nothing to offer.

Robotics and Artificial Intelligence

May 09, 2019

The planet/pluto news aggregator

More about the Pluto project

No comments:

Post a Comment