Robotics and Artificial Intelligence: May 2019

May 31, 2019

Fedora 30 is the best operating system ever!

The new Fedora 30 operating system runs remarkable stable. For non-linux users i'd like to give a short overview what's inside the iso-file which can be downloaded from the project website. An xwayland server displays the Gnome desktop to the screen. If somebody doesn't like Gnome very much he can switch back to any other GUI for example fltk, E17 or LXDE. But suppose, the normal Gnome environment is preferred, then the user has in well sorted settings menu to change the WLAN configuration, change the colors, add a printer and even adapt the prefered non-english language.

So nice so good, the Fedora OS comes with a preinstalled LibreOffice suit which runs great and additionally the user the choice to install any database, texteditor, programming language or computer game he likes. The only condition is, that the software was published with a Gnu public license and is available in the repository. For many advanced software this is the case. Some examples are Lyx, LMMS, pitivi, archive managers, backup tools, Java, C++ compiler, python libraries and all sorts of network tools. By far, Linux is the most advanced operating system ever invented and it comes with the most developed software tools. It has surpassed the UNIX commercial operating systems like Solaris, has beaten the IBM operating systems like OS/2, and can compete easily with Mac OS X and Windows 10.

The only problem I've found is, that is the overall system is absolutely boring. After pressing the power button the login menu is there. Because of the daily system updates, the system gets all the patches and improved software automatically so the chance is high, that the same operating system will work in 1 years, in 2 years and even in 5 years from today. In around 1 years, the next version which is v31 gets releases, but updating the system to the next step is also not a problem. The user has to press a button, wait a bit and then he gets the new system. I don't think, that the look and feel can become greater in the future, the only limit which is visible is the underlying hardware. If the computer has only a resolution of 1368x768 even Fedora can display more pixels.

All in all i can say congratulations to Red Hat and to the entire Open Source movement. The systems runs great, no problems found.

Linux as a rescue system

A look into dedicated rescue systems like GRML and knoppix show, that a recent iso file is far away from beeing small and compact. The GRML distribution which is used as a replacement for a real linux distribution will need around 400 MB on a CD-ROM drive, and systems like Fedora LXDE or Knoppix will need much more. The sad news is, that it's not possible to create a smaller Linux iso file because the Kernel alone will take 150 MB and additionally a strip down X Window system, systemd drivers and lot of additional software is needed. Even a ultracompact Linux rescue system will take around 1 GB space on the USB stick.

The alternative is to take a look at technology away from Linux and the C compiled ecosystem and Forth is the answer. A minimal Forth system which fits into 50kb of RAM is the complete operating system which includes the C compiler as well. In case of Forth the C language is Forth which allows to write much smaller sourcecode.

The only bottleneck in the Forth world is, that it is more difficult to describe than a Linux system. Explaining how to boot the GRML linux system is easy going. But explaining the same for a simple Forth kernel is much harder. The bottleneck in Forth is not the technical side. Forth is by far the most advanced operating system and programming language ever invented. The problem is, that it's poor documented and that most users are not familiar with it.

Let us make an example. From a technical side it is possible to build with Forth a webserver https://github.com/urlysses/1991 The code is ultrasmall and it is doing the same what a Linux system can do. The only problem is: how many users are able to boot the 1991 Forth webserver on their PC? And if they are not familiar which kind of literature they can consult? Exactly this is the bottleneck of Forth. It is an advanced programing language but it is hard to become familiar with it.

The problem with Forth is, that it can be implemented on any computer with any programing style. As a result, the newbie has the problem to find something which can be learned, teached and reproduced.

Back to the 1980s

In the early 1980s Forth was mentioned in the literature as a normal alternative to other language like BASIC and C. But slowly in the late 1980s, and especially in the 1990s Forth was no longer mentioned in the computing industry. The reason why can be explained on the CP/M operating system. CP/M was an early operating system for IBM PC. Before CP/M was invented, all the homecomputers were programmed in C, Basic and Forth. After CP/M, the importance of Forth had become smaller. So, what exactly was CP/M?

The idea behind an operating system was, to divide the users into programmers and endusers. The programmers are creating the software, for example a textprocessing tool, while the enduser runs the software on the computer. A second idea of a CP/M like operating system is to standardize the API with the help of layers. On the lowest level, the drivers for the floppy disk and the printers are there, on the next layer the OS commands for printing a document and access to a file are provided and so on. Both concepts are the opposite of what Forth had in mind. In the Forth ecosystem there are only programmers but no endusers, and in Forth nothing is standardized.

It is correct to assume, that without the invention of CP/M like operating system, the importance of Forth would be greater today. If a computer has no OS, Forth is a powerful technique to program the machine. Operating systems and especially graphical operating systems have reduced the importance of Forth. Most users (and even most programmers) today never heard of Forth. What they are using in their daily life are operating systems, and C- like high-level programming languages.

But why exactly was CP/M a great success? The reason is, that the normal users wasn't interested in the IBM PC but he was fascinated by applications for databases, calculations and games. The combination of an operating system plus a C like programing language makes it easier for programmers to write large scale end-user applications like dbase, Microsoft Office and similar tools. Such fullblown programs had in the 1990s a need for hundred of megabytes and today they will need gigabytes on the harddrive. The enduser likes such software because it provides him all the features he needs. In contrast, Forth software tends to be as minimalistic as possible. Even somebody writes a texteditor in Forth, he will only provide 10 features but not 10k.

Software for CP/M

In the 1980s many software were written for CP/M, for example Supercalc, dbase II, and programming languages like Turbo Pascal, Lisp, Comal, BDS C and so on. From the perspective of Forth all of these attempt are not helpful. They build an additional layer between the user and the machine and they make the overall system more complicated. But the software market in the 1980s and later had a different opinion. He came to the conclusion, that a standardized operating system, lots of application software and different programming languages which are compiling into assembly language are a here to stay.

Again, before the invention of the CP/M operating system, Forth was recognized as good alternative to existing programming languages like Assembly and BASIC. After the invention of CP/M everything has changed and nobody was interested in Forth anymore.

What remains unanswered is the question if the success of CP/M and later operating systems like MS-Windows and Linux were the right way or if Forth was the better answer in programming a computer. To answer this question it make sense to not focus on the operating system itself, but on high-level applications for example Wordstar. Wordstar was one of early large scale textproceccing software The old wordstar version for CP/M had a filesize of 390 kb and the MS-DOS version had 920 kb in size. A modern textprocessing software like MS-Word will need much more disk capacity. Why are these programs so large? Because they are providing so much features. Sure, there were alternatives available for CP/M which are equal to a small texteditor but the endusers prefered Wordstar. Perhaps this helps to explain why CP/M and other operating systems were a success. Because the idea was not to write a small efficient program which contains of 10 Forth words, but to program a fullblown application which contains of many megabytes of disc space.

And here is the bottleneck of Forth: it doesn't support very well the idea that the application will take 100 MB in size.Writing such large application doesn't make much sense for Forth programmers. All the Forth programs available are small and very small. If a Forth program is larger than 10kb in size, it is equal to a large application written by a team of programmers.

Modern non-Forth related ecosystem have the tendency to become large in size. MS-Windows and Linux as well take 20 GB on the harddrive at the minimum. If the user installs additional games and software he will need many more free diskspace. The reason is, that an operating system like Linux provides many things which are not needed. Instead of using one programming langauge for all task, Linux comes with a preinstalled Python, AWK, C, C++, Fortran, assembly, Ruby and Javascript interpreter. Instead of providing a single window manager which is the best choice for all needs, the user can decide between hundred of Window managers. All these increase the amount of occupied discspace. In the MS-Windows ecosystem the situation is the same. For one application domain like a database, the user can install many dozens of applications which are doing all the same. Windows and Linux are the opposite of what Forth is educating the user.

Applications for CP/M

CP/M and Forth can be understood both as minimalistic in design. The difference is, that CP/M wasn't a programming language but an operating system. This definition is important because of the cultural background in that time. The main reason why CP/M was successful was because of it's applications. Programs like Wordstar, dbase, Multiplan, and even Autocad were available for CP/M. Somebody may ask what has a program like Wordstar to do with a Intel based microcomputer, and exactly this is the point. There is no connection. The application stands for it's own, it is working on a higher layer. The software company MicroPro (Wordstar) was working independent from Ashton-Tate (dbase). In the company different types of programmers have written the code, they used different programming languages, and they made different mistakes.

The funny thing was, that dbase was not written for CP/M in mind. CP/M was only one target platform, later the program was migrated to MS-DOS. That means, dbase was it's own ecosystem and it has nothing to do with IBM-PC, nor a certain programming language.

May 28, 2019

Utilizing the Wikinews website for making advertisement

Some days ago a new article was created within the wikinews project https://en.wikinews.org/wiki/Toronto%27s_Anime_North_brings_thousands_together It is about an anime convention held last week in Toronto. The reason why this article can't be deleted by the wikinews admins is because the event was covered by larger newspapers. That means, after typing in the keyword “anime north” (which is the name of the festival) into “Google news” some larger newspapers article will give the details. Two of the sources are given at the end of the wikinews article.

What is interesting is, that the authors of the article have invested a lot of energy. For example they have put a lot of photos from the convention to the Wikinews article. Right new the article is in the normal peer review queue. The question is, if the article gets published in Wikinews or not. This is hard to predict, because from an objective point of view, an Anime fan convention is not an important world-news. On the other hand the event is important enough to get some “Google news” articles. So my prediction is, that the article gets a Wikinews headline. Another reason which speaks for the article is, that in the past some similar articles were published under the category: “Anime convention” which have also provided lots of photos.

It seems, that it's possible to publish something at Wikinews if Google news has at least 2 articles about the topic already and if the authors investing a bit of energy into creating the Wikinews article. As a consequence the article will overjump the incoming spam patrol and will pass the peer review in wikinews also.

Creating a paranoid story the easy way

Philip K. Dick is known as the master of paranoid writing. But what exactly are the ingredients to make a character think that he is pursued? The explanation is surprisingly easy. There are two important preconditions for telling a paranoid plot. The first one is, that the character is unable to reach an important goal. For example his goal is to walk to a city but he not able to do so. And the second aspect is, that the character should track a group of people which are called “they”. They means, that the character is trying to understand the reality, takes persons, places or ideas in his focus and builds are model of the inner working.

Let me give an example. The main character of a story is at home and likes to travel to Paris. He is attending a travel group in a train. And now the train stops at the half way and the travel group is doing something. The main character understands only the half of what the group is saying and recognize that the plan destination can't be reached. This results into a fearful situation, which means that the character is overwhelmed by the situation. As a result he becomes paranoid. That means, according to his model, the travel group is conspiring against him.

Pretty simple story line, isn't? The funny point is, that it's very easy to invent hundred of these stories with different characters and goals. The only important elements are a goal which can't be reached, and a group which is tracked but understood wrong. On top of these simple elements it is possible to produce a realistic dialogue and the overall plot which looks highly realistic. It is equal to place the maincharacter in a trap and let the environment work against him and the result is called a paranoid plot.

It is possible to explain the difference between a normal plot and a paranoid one. In a normal plot the character is able to reach his goal. That means, he plans to reach Paris and is able to do so. He starts with an optimistic world understanding and he is right. The train is on schedule, everything works great and he loves the journey. It is common, that this is combined with a successful tracking of the opponent group. That means, the character observes what the other passengers in the train are doing and they have the same world understanding than the character himself. A communication is possible.

In a paranoid plot, both assumption are hurt. That means, the ability to reach the planned goal is not there and the ability to track the opponent group isn't working. This is perceived by the readers as stressful situation.

Search algorithm as AI-technique

A search algorithm is used in classical computer science for graph traversal. The same idea can be utilized for all sorts of Artificial Intelligence algorithm. The question is only how to get better performance. Let me give an example. Solving a game of chess is a search problem, controlling a robot is also a search problem and recognize an image can be treated as a search issue too. The reason why controlling a robot is treated as an AI problem is because there is no search algorithm available which can handle the task. A vanilla search algorithm for testing out all possible actions in the state space will result into a poor runtime. That means, there are billions of potential trajectory and only a handful of them will solve the issue.

The question isn't: what is AI, the question is how to improve the performance of a search algorithm. Some of highly effective strategies are: model based search, macro actions and hierarchical search. All of them are able to reduce the search space drastically. They are not realizing AI directly, but they speed up the search algorithm. That means, a vanilla search algorithm for determine the trajectory of a robot will take 2 hours and realizing the same idea with a model-based search algorithm will take only 1 seconds runtime.

STRIPS

A well known technique for increase the speed of a robot path planner is Strips. The idea is, to model high-level-actions in a pddl file. What strips has in common with macro actions and hierarchical search is to build a model. A model can be understood as a game around the original game. Let us give me an example. A robot should navigate in a maze. This is the description. Solving this game with a normal search algorithm isn't possible, because the state space of this game is too. The answer is to invent a derivative game which has a small state space. This new game is equal to a model. Strips is one possible technique for creating a model. A strips model contains of subactions, which can be executed and which brings the model into a follow-up state.

In the domain of game programming the term model isn't used. Instead the programmer call it a game engine or rule engine. This modul determines which actions are possible in the game. According to the game engine the player can move up, down, left and right. If somebody knows the game engine for a game, he can solve it. So the problem is how to invent the game engine which fits to the reality and which has a small state space. Most AI-related projects are discussing this single question. Without a model it is not possible to search in the state space.

Search for a plan

If an artificial life agents want's to execute actions he needs a plan. A plan is a sequence of actions. The plan is executed inside a model. The model provides potential actions like up, left, right and the plan is the sequence of these steps. An optimal AI system is able to determine the plan at runtime, but can also change the model at runtime. Let us describe the idea in detail.

Everything starts with a challenge, for example the robot should reach the goal of a maze. On top of the challenge a model is constructed on the fly. The model contains high-level actions. For the model a plan is generated and the plan is executed by the robot.

A possible strategy for automatic model learning is “Learning from demonstration”. A human teacher demonstrates the actions. The robot will not repeat the actions, but it creates a model for the demonstration. This allows the robot to search in the model for a plan.

May 24, 2019

Social Wiki for Artificial Intelligence

The newly discovered AI Wiki https://ai.fandom.com/ was extended with some smaller articles. To make things more comfortable, I've searched for similar projects in the past and found two existing projects. The first one is active today and is a bibliography creation wiki for papers about natural language processing, https://natural-language-understanding.fandom.com It contains of 300 articles right now. A second canceled project from the past was a Game AI wiki, https://web.archive.org/web/2010*/http://wiki.aigamedev.com/ Which had around 180 articles. Also with the idea to collect URL of existing ressources in a wiki.

Both projects seems technically very advanced. The only problem was, that the amount of contributers was too small. I think the idea of a social wiki makes sense, the only thing what is important is, that the project should be established for a longer period and with more man power in the background. Let us make some calculations.

Even without knowing the details of both mentioned wikis the assumption is, that there were realized as a one-person wiki. A single enthusiast is collecting some URLs to existing ressources at a single place and he is using a wiki for that purpose. According to my knowledge this is the cheapest way in doing so. The only thing what was missing was a bit luck to scale the wiki project upwards. Suppose, not a single person, but 5 persons are contributing to the wiki. And they are doing so not only for 6 months, but for 3 years. As a result the overall amount of articles would be much greater, and as a consequence more people would notice about the wiki.

Reducing the costs

The key factor for realizing a social wiki project is to hold the costs down. Most wiki projects in the past were canceled, because the single admin of the project has stopped the project and then the server costs were too high. In case of a fandom wiki project, the wiki is free of charge. Instead the website is financed by advertaisment. Also it is very useful, that new users doesn't have to register at the wiki itself, but can create an account within the fandom universe. To reduce the costs further, here is a recommendation for adding new URLs to the wiki.

It is the same template I'm using for adding new facts to the wiki. The idea is to put the URL into the tag and use as a default category the current year. Also each article becomes a date in the title, similar to a wiki news article. The reason is, that in contrast to the Wikipedia encyclopedia, it is possible to create for the same topic many articles. For example about the topic behavior tree, at least three articles were created in the system so far.

Example

Let us make an example. A user has found an interesting URL in the Internet and would like to submit it to the wiki. What he has to do is to fill the information into the template and create a new article for it.

I have changed the date field, and copy&paste the URL into the "ref" tag. If the new article is created in the system it can be categorized further and -- very important -- can be discussed by other. for example at the talk page.

May 22, 2019

Fedora 30 update successful

Today, I've migrated to Fedora 30. In contrast to a common misconception the borg-backup tool is working great, all the data from the USB stick have been restored into the home folder. What is new in Fedora 30 is, that no longer a root user is added in the install procedure. Instead only a normal user is created which can use the "sudo" command to gain more priviligues.

It is important to know, that after the first login, the Fedora system will download a lot of information in the background. This is around 1 GB in total. The best way is to wait until the task is done and doing other things in the meantime. The good news is, that in the default software repository programs like Google chrome can be installed easily. In previous FEdora version, the user had to add manually a repository, this is no longer nessarcy.

From the look and feel there is no improvement to previous versions. The announced faster login or the lower memory consumption is not there. The gnome desktop will take around 1.5 GB as default for rendering exact one window. The good news is, that all the programs are running smoothly. The webbrowser shows the Internet, the texteditor allows to read files and the filemanager can open USB media.

The overall look and feel is very similar to Mac OS X and Windows 10. The average user gets an easy to use operating system which he doesn't understand. If the user is interested in learning about computing itself, the better OS is Eulex Forth, http://forth-ev.de/wiki/events:ft2019:start which takes only 50kb of RAM.

May 20, 2019

Interesting traffic statistics in the own blog

The good news in a newly created blog is, that apart from the author itself no futher traffic is there. All the pageviews are generated by my own browsing through the website this gives a nice opportunity to watch the traffic statistics in a neutral point of view.

According to the graphic, the most accessed page is called “social network”. In this subpage I'm tracking my own blog. It is the about page in which interesting content is linked and annotated. The surprising fact is, that this page was visited more often than all the other pages in blog. The social network / about page is the informal table of contents, it aggregates the entire blog into a simple list. Right now the page is not very long. It contains of 13 entries in the format “date, url, comment”. but it seems, that I've watched and edited this page more than all the other subpages in the blog.

Like i meniioned in the introduction the statistics is a neutral one, because apart from own visits in the blog no real audience was here. What can we learn from this information? We can learn, that perhaps in any blog the about page is the most interesting one. The author of the blog reads them very often. It is unclear what will happen if serious traffic is available on a website. Have the random reader also the motivation to see through the meta-section of a website or is he more interested in the content itself? I don't know. But the prediction is, that the meta-information are more important than the serious content. The best example is the library. What all the users are doing is not to populate the shelfs with the books but they are occupy the Online public access catalog (OPAC) and the room with the index cards if the library is a bit outdated. That means, the meta-information of a system are creating the most pageviews.

But what exactly is the difference between a normal blog post and a metablogpost? I'd like to answer this for my own blog. The total amount of postings is 131. in the meta-section only 13 are mentioned. Additionally, in the meta-section only the URL is provided but not the fulltext. It seems, that a typical sign for a meta-section is, that the number of entries is short and that only URLs are provided.

Minimalist Wikinews article was created

https://en.wikinews.org/wiki/Google_discontinues_cooperation_with_Huawei is a recently created Wikinews article. What makes the page interesting is the question if the article gets deleted by the admin as spam or not. At first we have to take a look into the sources at the end. There are two of them, the first one comes Google news and is an english article from Reuters about the subject which is a business news from Google itself. The second source for the article isn't listed in Google news. It is a news agency from Bosnia and the text is written in Bosnian.

If both sources are from Google new i would guess, that the article will become by 100% an official Wikinews article. But in this article only one source comes from Google news, while the other source is some kind of alternative source. I'm unsure if this would match to the incoming control of the Wikinews project. The good news is, that the following actions of the Admins can be tracked. That means, the article is in the system and we can observe what will happen next.

Update (12 hours later)
Wikinews came to the conclusion, that the Croatian/Bosnian source is not reliable enough. The first reason was, that it is not listed at Google News which is an aggregator for major serious news sources and the second more problematic reason was, that the hub is written entirely in a non-english language which makes it hard to read the information.

The article about the relationship between Google and Huawai wasn't direct flagged as spam but it didn't pass the incoming filter.

May 18, 2019

The social network section of this blog explained

This blog has a dedicated social network section. It was created as a static page from within the blogspot editor and it is shown on the sidebar permanently. The purpose of such a page is to aggregate existing content. Let us give an example.

Blogs are working with the principle in mind, that the author creates a new blogpost which is categorized according to the date. All the postings of may 2019 are put in the same folder. Additionally most blogs have a searchbox and a tag system to find previous posts. But all these properties are not enough what is needed additionally is a meta section of a blog.

A meta-section is sometimes called the about section. In most blogs, the author explains to the public what the blog is about. The meta-section can be extended with a curated playlist. A playlist is by definition a list of URLs. The playlist in the about section of a blog contains URLs to existing blogpost. And exactly this is the purpose of the social network section in the nav-bar. I've searched through my own blog and identified some of the postings which sounds interesting and put them onto the playlist. A new visitor gets on the first look an overview about the blog without browsing through all the postings. The format in the about/meta section is simple because the list is formatted with “Date, URL, comment”.

From a perspective of Web 2.0, the metasection is equal to repost existing content. Which means, in the meta-section the URL to existing piece of information is given to increase the visibility of this posting. In the meta-section not all the postings are listed only the important one. That means, some of the blogposts have an entry, and other not.

The surprising fact is, that the pageview of the metasection is the highest of the entire blog. The reason is, that this section gets updated frequently while a certain blog post is no longer relevant if it's stored in the system.

The idea behind the metapage / social network section is to track the own blog. That means the blog contains on the lower layer of new posts which are added. And on the metalevel all the content in the blog is monitored. The idea is to analyze which content is already there and how much often the content is visited.

May 17, 2019

More spam than usual at wikinews

On normal days, the amount of newly created articles at Wikinews is low. Each day around 2-3 spam pages were created by random ip adresses and it takes around 30 minutes until the admin deletes the content. But today something is different. According to the recent changes tabes around 7 pages which contains obviously spam were created this morning, but the admin doesn't react. The newly created pages are in the system. Like most of spam articles they didn't have references at the end and containing one or two paragraph with a poem or lorem ipsum test messages.

UPDATE
Most of the newly created spam pages were deleted by the admins. In contrast to the normal admin behavior the duration was longer. It took around 5 hours between creation of the page and the deletion.

May 16, 2019

New AI Wiki discovered

https://ai.fandom.com/ is a wiki project hosted at the fandom wiki farm. It is about Artificial Intelligence and contains only a little amount of articles. According to the changelog the Wiki has only a small amount of contributers, so it is in an early stage. The main advantage is, that it's possible for a normal user to register an account at Fandom and then he can contribute. I've tried out and made the first edit yesterday.

The most important question around each wiki project is, how to increase the article count. In a classical wiki this is the bottleneck. The first 10 articles are created by enthusiasts but then the projects slows down and after a while nobody writes new content. How to avoid such a workflow is simple. The wiki has to be realized as a social wiki. The idea is that the minimal contribution is similar to post an URL link to a social network.

That means, the users of the wiki aren't asked to invest their spare time in article writing and content production but they are asked to drop down URLs from their favorite youtube videos and their own projects which they want to push forward. The funny thing is, that many users out there are searching for a place to post their backlinks. And the AI Wiki is the perfect place for doing so.

The more important question is, what is the meaining of a website which collects URLs? Does this website has serious readers? Oh yes, because a manually curated playlist about usefull AI ressource is something which is not available right know. The problem is, that so many great content is out there in the internet but what is missing is a peer review hub which is collecting all the URLs and commenting about the quality. Somebody may argue, that the Google search engine is the clearing house for the internet. But google is only a search engine it is not a social network.

There is a need for a place in which users can paste their favorite URLs and comment the URLs from other users. This allows to discuss in a relaxed environment current trends in robotics, machine learning and AGI. A minimal newly created Wiki article contains of:

date, URL

The next user can add more information for example a single sentence comment or perhaps a two paragraph text. Then the article looks similar to a Wikinews article. The typicle wikinews contains of two url plus two paragraph of text.

It is important ot understand that social networks are working in the reverse direction. The idea is not to create first an article about neural network and in the second step the given article is extended with ressources, but the existing ressource stands at the beginning which is extended by comments.

Let us go a step backward. The idea of a content wiki is, that no other ressources are available. The content wiki is the first place in which an author can write an article. So it is a blog but only in a group. This kind of understanding results into self-blocking groups. As a result the users stay away from the wiki because they fear the judgment of the group. The more easier way is to assume that lots of existing ressouces are available and the aim of the wiki is to collect these URL. A single operating in this wiki is a cheap task. Cheap means, the the amount of time which is invested is less than 1 minute. This is the time it takes until a user copy&pasted a link into the wiki and pressed the submit button.

In case of doubt a chatbot is able to post the URLs to a wiki. The bot takes as input stream an RSS feed which contains of:

Date, URL, title

and creates for each item in the feed a new Wiki article. If the RSS feed contains of 100 items, this results into 100 newly created articles. Each of them is ultrashort and contains only of the URL plus the title. Such an automatic autoposting bot is able to increase the article count in the AI wiki from now 89 to 200 and more without much effort.

The prediction is, that after the article counter has increased some new users gets interested into the project and post their URLs in response to the links which are already in the system. A social network has the natural tendency to become attractive to large variety of users, especially users who are not motivated to contribute content but only want to have a bit fun while interacting with other users.

Wiki vs content management system

Wikis are sometimes described as the ultimate content management system. But let us go a step backward, which kind of CMS systems are already available apart from Wikis? Their number is huge. There are blogs, forums, pdf repositories, video platforms, static HTML content, powerpoint presentations, printed books, landing pages of academic journals and even diskmags are available. In all of the content systems fulltext is stored. Why do we need additional an wiki which can hold more content? No we don't. The classical Content management systems are working great. A classical ftp server which contains of pdf papers is a great choice as a long term repository for serious information. There is no need to convert the pdf papers into the wiki syntax.

What is not available right now and which can be served by wikis very well is a peer review system. That is a hub in which a group of people judge about the content and compare the content to each other. Such a system can't be realized within a blog nor with an FTP server. The underlying needed structure is called a social network. And the technical side is called a social network software.

May 14, 2019

Wikinews is a social network

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .

Abstract

The evolution from single user blogs, to aggregated blogs into social networks is an process which takes time. The main idea is to separate between content and the URL of the content. If URLs are posted to a public forum this results into a social network. Curated URL playlists are an independent way of analyzing existing content and a peer review is made possible.

Tags: wikinews, social network, Open Access, content aggregation

Reddit
1.1 Creating reddit posts fast and efficient
1.2 Understanding the idea behind reddit .
1.3 Understanding Reddit made easy . . .
1.4 Planet gnome as Reddit light . . . . .

Wikinews
2.1 How to clone Wikinews? . . . . . . . .
2.2 Analyzing the Wikinews recent changes
2.3 Why Wikinews is better than Facebook
2.4 Building a wiki based blog aggregator .

Misc
3.1 Definition of a social network . . . . . . . . . . . . .
3.2 Academic peer review with social networks . . . . .
3.3 Setting up a social network from scratch with mediawiki
3.4 Is Python the right choice for creating a dynamic
website? . . . . . . . . . . . . . . . . . . . . . . . .
3.5 reinforcing vs inhibiting cascade in the peer review
proccess . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Creating a link posting bot but why? . . . . . . . . .
3.7 Python script for creating fake news . . . . . . . . .
3.8 Best practice method for an intranet . . . . . . . . .

3.9 Evolution of bookmarking tools

1 Reddit

1.1 Creating reddit posts fast and efficient

Before a newbie should post something to a famous news aggregator website, it is a good idea to try out the idea in a smaller version. I'm not talking about the markdown syntax which shouldn't be a major concern for most authors, but the idea of posing links to external websites. A good starting point for creating a reddit stream of consciousness is in the own blog. The idea is not to post another text which explains how the world works, but give the overall blogosphere a voice.

In the early time of blogging, a certain style was used which can be seen as webring frontend. THe idea was, to collect all the URL to new content which was posted by the community in the last week. Before a URL can be put onto the list, two conditions have to be fulfilled, first it has to be fresh, and secondly it most fit to the community. A typical template for creating such a news aggregator post is:

1. URL, title, description, date.

2. URL, title, description, date.

3. URL, title, description, date.

Such a syndication feed has the aim to mirror the overall community. The peer should be found on the list. The value is generated for a the audience, which gets a collected impression of what is going on a certain domain, but also for the content contributers who gets lots of links and traffic from the aggregated RSS feed.

After this formal setting was explained we can discuss about the details. Which URLs should be put on the list? Nobody knows, the list will look different on a subjective basis. It has to do with priorities. If a list is posted into the own blog, nobody else will modify or comment the selection. If the same list is posted to reddit, a feedback mechanism is started. Which means that somebody else will argue, that the selection is right or wrong. This is the reason why posting something to reddit is more complicated. It is some kind of meta-meta blog list.

Let us go a step backward and assume, that the list is not posted to reddit but only to the own blog. This makes it easier to explain what the principle is. In the own blog, the one and only admin is you. That means, there are no other users, who are argue against the list itself. Instead the judgement is done only once. The judgement is always about the community. If the topic is called Artificial Intelligence, then the newsfeed has to contain all the relevant content from this domain. But what is relevant? Google knows it. Google is able to return all the updates from the last week. The problem with google is, that the amount of URLs is too high. So the question is which of the posted AI content from the last week is interesting and why. This can only be answered if someone is familiar with two points of view. The first one is Artificial Intelligence itself and the second one is the community, which means the blogs which are posting something about the content.

The good news is, that the amount of content sources about AI is limited. We ignore all non-english speaking content, because it is not relevant for a worldwide audience. US speaking AI programmers can't read german nor japanese. So they have no need for clicking on weblinks to such content. What we have to answer are two things: first, which kind of demand is there (=what likes the reader read) and which kind of supply is available, which means what content was posted. A news aggregator has to combine both needs into a single document.

Let us explain why doing the middle function is complicated. The demand for AI Content is driven by non-experts. Most of the 7 billion people in world are interested in robotics and AI but never have read a science paper from the subject. At the same time, most AI researcher have an academic background which means, they are posting papers at Arxiv but explain the working of their theories too complicated. Is it possible to bring both sides together? Probability not. I'm very pessimistic that an AI news aggregator will work in reality.

Small tutorial

The basic steps for a creating a meta-blog which is observing a certain topic is not very complicated. We have to enter the topic into the Google search engine, restrict the results to the last week and go through the result page with the aim to identify major developments. The result is written down and that is the aggregated newsfeed.

The reason why this is not done more frequently is because it takes time to create such content. It is more easier to create a normal blog over a blog aggregation. And sometimes it is argued, that there no need for creating such a list of URLs, because anybody else can use Google in a the same way. But most existing watchblogs and RSS combination sites are perceived as usefull, so it seems there is a demand for this service.

Banning users

In most Reddit tutorials it is explained what banning means and how to avoid it. Reading such advice is very funny, because if a user gets banned, he is no longer allowed to create a watchblog at reddit. Let us go a step backward. Creating a normal blogpost or a normal Q&A question is easy. For example, somebody has the idea to post something about his latest python sourcecode excersise and a blog is the best way in uploading the text. In contrast, a news aggregation website is harder to create. It follows the above described steps in which a user is searching at google for the latest useful content and the resulting list of URLs is put either on the own homepage or posted to reddit.

So what will happen if such an attempt is blocked or banned by the reddit moderators? Not that much. It means only, that the user has created a list of wrong URLs or that the idea of creating an aggreated meta watchblog is not desired at reddit ... Very funny. In both cases, the user can upload his RSS newsfeed to it's own page. What i want to explain is, that banning a watchblog doesn't make much sense. Because the amount of people who are creating such aggregated content is small. In most cases the problem is not, that to much syndication feeds of hand-selected URLs are available but too little.

Let us explain the situation from the opposite perspective. Suppose, there are 100 bloggers in the internet who have created all their own “yet another robotics” blog. They are creating the content for it's own, what the community doesn't have is some kind of meta-blog which is observing all the content and selects the most useful posts. What will happen if one of the 100 bloggers will do this job, will the other bloggers argue, that he is not allowed in doing so, or that they doesn't want to be referenced in the meta-blog? The opposite is true. They will like the idea very much and they hope that they don't have to take this part by themself. Because it is easier to create the own content than aggregating existing content from others.

The idea of banning a watchblog is some kind of inside joke. The story doesn't make much sense. Because banning means, that no meta-blog / RSS Syndication is available which will reduce the visibility of the content and makes it harder for the community to connect to each other.

The more problematic behavior is, if a new user get's a plattform at reddit. Because this means, that the community trust him and is asking for a list of well sorted content from the last week which includes the URL to external websites.

1.2 Understanding the idea behind reddit

Before we can describe what reddit is, we have to understand first the problem which is addressed by reddit. In the internet, are two kinds of websites available. The first one is a classical newspaper for example the new york times which is publishing dozens of postings each day created by a group of authors. The second type of websites are blogs which are providing one or less postings a day. Reddit brings both type of websites together. The concept is called content aggregation and is equal to form a virtual community. But let us go into the details.

A normal blog in the internet is not able to compete with a newspaper. A single blog doesn't have the same manpower like a large scale newswebsite. But this problem can be solved with a trick. What is, if the content of hundred of blogs are combined to a large mega-rss-feed? Right, this would result into a pseudo newspaper created by different blogs. That means, all the blogs in the internet are providing the content but the information is distributed over many websites. So the question is how to mix the existing content into a handy GUI, and here comes reddit into the game. The basic idea is that under a topic /r/linux all the Linux blogs out there are combined. If a blogger has posted something about Linux he has to announce it at reddit too.

In reality, the principle are a bit different because the official reddit rules are not allow self-linking. That means, if a blogger likes to link to his own blog the reddit community becomes nervous. But this kind of rule is only a detail problem, because anybody else can post the link as well. The inner working remains the same, that under a single website the content of distributed websites is visible. It is important to separate between a page like reddit and a handcrafted URL directory like Yahoo. The idea of Yahoo and dmoz was to build an hierarchical index. That means, in year 1 all the linux blogs are submitted to the index and then they are staying in the list. In contrast the idea of reddit is to provide a newfeed. Which means, that individual posts are redirected into the reddit groups.

Let us summarize the information into a do's and don't tutorials. What is not allowed on reddit is, that somebody posts at /r/linux a question like “Hello, I need help with the installation of Ubuntu, i've typed in a commandline but it doesn't worked”. The problem with this posting is, that reddit is different from an online-forum. What is also not allowed at reddit is, if somebody posts “Here is my blog, please subscribe”, because this is threated as offensive.

Now we can answer which sort of information is welcome at reddit. If somebody posts a link to an online forum in which recently a topic was discussed, or if somebody posts a link to an interesting blog post. In general we can say, that Reddit was invented to aggregate weblogs in the internet from the last week. In the best case, the postings are added by personal comments and are not from the original bloggers but independent followers.

Blogs vs. forums

Well known websites in the internet are blogs and forums. The idea behind a blog is, that anybody can post a text and images he likes. There is no need to submit an article to a newspaper, instead the content can be uploaded into the own blog. It get indexed by google and can be found worldwide. The idea of a forum is quite different. Here is the concept, that not a single person but a group of persons are interacting with each other.

But what stands above all the groups and all the blogs? Right, a fulltext search engine called Google. Google is there place in which beginners type in a keyword and gets redirected to the websites in the internet. It's an interesting fact that apart from the Google searchengine other kind of information filtering mechanism are available like Reddit, Facebook and RSS Readers. It seems, that fulltext search alone doesn't solve the problem of content filtering. To understand the concept of a gatekeeping website we have to analyze the website of newspaper. A newspaper provides fulltext content but it provides also a list of articles. What reddit like websites are doing is to split up a newspaper website into the fulltext and the article feed. Reddit provides only the article feed which has to be filled by external blogs. A reddit feed is edited by a group of people. And this makes the concept so interesting. Because this group is working with certain principle in mind. The game is not “how to create an article about a subject”, the game is called “how to create a URL list of articles about a topic”.

Tutorial

Existing tutorials about reddit are trying to analyze the given rules and the current discussion on reddit itself to describe which kind of behavior is right or wrong. The better way for getting in touch with reddit is to ask what the purpose of a news aggregation website is. We can ignore the official and the unofficial reddit rules and describe from an abstract perspective what the term content syndication means.

A mega RSS feed has the idea to support two sorts of users. At first, the readers of the feed who have a need for the latest headlines of the internet but also the content providers should be supported because they have a need to put their content into world. Let us describe the principle first from a technical perspective. The well known RSS XML format provides date, title, abstract and URL of a blog or of a website in general. A mega-rss feed combines single feeds into a larger one. A formal property of a mega feed is, that it provides information from a certain timespan (for example the last week) and from a certain topic (for example about robotics).

The first attempt for a news aggregation is to collect all the Headlines of the internet from the last week about robotics in a single RSS stream. This attempt goes into the right direction but has the disadvantage that the amount of informaiton is too much. So the next question is how to reduce the number of entries. And voila, this is the purpose of reddit. In reddit only a small amount of the total informaiton is presented and additional, the other users can upvote and downvote to modify the selection. Basically, a reddit subgroup provides the same content what is visible if we enter “robotics” in the Google News webpage, with the exception that no double entries are visible and the content has more quality.

The question is not if reddit is a great website, the question is, if a manual created RSS aggregator makes sense.

1.3 Understanding Reddit made easy

Reddit can seen as a phenomena. Most marketing experts are aware that the website exists, but they are unsure how to utilize the website the right way. A common misconception is to see Reddit as some kind of forum / social media website. A forum is something which can be moderated and which is a about a certain topic. Stackoverflow for example is a forum. The user is posting something and the community can answer it. Reddit is working slightly different. Another misconception is to describe Reddit as a social bookmarking service which allows user to insert new URL to websites they like. Bibsonomy https://www.bibsonomy.org/ is a bookmarking website but Reddit is again something odd.

The best description what the so called social news aggregator is about can't be found in the subreddits nor in the help-section but in the blogosphere. If somebody has understood what the blogosphere is, he gets an impression what reddit is about. The blogosphere is something which is only indirect connected to Reddit, it is the content outside of the website. The blogosphere and Reddit are dependend from each other. If we shut down all the blogs in the Internet, Reddit is closed too. So what is a blog?

The typical blog was created by an individual, is updated not very often and contains amateur content. The total number of blogs in the world can only be estimated. The latest statistics explains, that each day around 3 million blogposts are created worldwide. That is a huge amount of content and it is growing. The blog community is only a small problem. They didn't find readers. The problem with most blogs is, that they are too small to reach a larger audience. In contrast to major newswebsites and portals, the typical amateur blog has only a few or sometimes zero subscribers. Somebody would argue, that all the blogs are indexed by Google and that Google will direct the traffic to the blogs. Technically this is right, the problem is that in reality this principle is not reliable enough.

How to solve the low traffic problem of amateur blogs? The answer is to create blogging communities. That are meta blogs on top of the existing content which are selecting and aggregating information. For example, there are 100 individual blogs about Lego Mindstorms available in the Internet. And now, somebody creates the 101. blog but this time, he doesn't post yet-another tutorial for creating a line following robot, but he creates a meta-blog in which he is monitoring what the other blogs are doing. A typical post has the title “Blog ABC has posted an NXC tutorial”, or “Blog DEF contributed a video from the last mindstorm competition”.

Suppose, a newbie likes to read some news about the Mindstorms community, which blog he would like to subscribe? Blog ABC, DEF or the meta-blog which is aggregating all the information? Right, he prefers the meta-blog. If the meta-blog is able to analyze the content the right way he was able to connect all the subblogs into a virtual community. All the blogs will put a link to the meta-blog and this will help the bloggers to see them as part of a larger group. They can stay within their own blog and the same time they get readers from the meta-blog.

This kind of hypothetical situation results into the Reddit page. Reddit is a meta-blog. It connects all the millions of individual blogs in the world. With this background knowledge in mind it is very easy to describe how the reddit community is working and how not. The first important point is, that community building can be done outside of Reddit. If somebody likes to connect 100 existing Lego Mindstorms blogs into a larger community he can set up a normal meta-blog / linkblog / blog-aggregator website. The only thing what is important is, that the content is handcrafted, because this results into a higher quality. The chance that a meta-blog is subscribed by the readers is much higher than a normal blog. The disadvantage is, that no content can be posted in a metablog, because it's role is to aggregate information which are already there.

The second step is to mirror the own meta-blog to Reddit. This sounds complicated but in reality it means only, that the postings can be upvoted and downvoted by the public. The meta-blog is no longer in control of a single admin, but it is written by everybody. The shared similarity between a meta-blog and reddit is, that both websites are aggregating existing content. They are forming a virtual community which is feed by the underlying blogosphere. To understand Reddit we have to take a look at the referenced blogs outside of Reddit. It is important to know, what the individual 100 Lego Mindstorms blogs are posting each day. If somebody has understand it, he can manipulate the reddit community easily.

Let us go a step backward and assume that Reddit doesn't exist. The task is to monitor the blogosphere. Not the entire one, but only the blogs about Lego Mindstorms. Which tools are available to do so? The first one, that somebody should know the topic itself, which is educational robotics. The second one is, that he has to know which important blogs are available, third he needs a fulltext search engine like Google and perhaps additional tool for analyzing the content in detail. Then he would create a report about the blogosphere acitivity and this report is posted to the internet on a daily basis. The result is called a meta-blog or a monitoring blog. If it's written well, it will attract a larger audience outside the Lego Mindstorms community because it allows the newbies to get an overview over the topic.

The disadvantage is, that creating such a monitoring blog is a demanding task. It can automated only in parts and this is the reason why the amount of such blogs is low. For mainstream topics like politics and music, professional gatekeepers with with TV and newspaper background are doing so routinely. They are using lots of manpower for example 100 persons at the same time, who are monitoring all the content in the world and aggregate the information into a newspaper. But for special interests topics like Lego Mindstorms, no dedicated News-agency is available.

Overview article

In academic writing there are two different sorts of articles available. The first one is a normal paper, in which somebody explains how a certain topic is working. For example the researcher has figured out how to realize a pathplanning algorithm. The second sort of manuscript doesn't provide new content but it's an overview paper which is analyzing what the community in the last 10 years has published in the domain of pathplanning articles. Overview article containing lots of references and they are compared to each other with the aim to draw the general picture. Overview articles are more complicated to write. In most cases, they aren't created by beginners and they are ranked higher in the result list.

A typical example is, that in a period of 10 years around 120 papers about pathplanning algorithm were published, and 2 overview articles which are analyzing these 120 papers. The two overview papers have much in common with reddit.

1.4 Planet gnome as Reddit light

Reddit is one of the most frequented websites in the world which makes it hard to explain the inner working. A more smaller and handy mini version is available which has a different look and feel but it is operating with the same idea. It is called Planet gnome https://wiki.gnome.org/PlanetGnome and sees themself as a meta-blog about the gnome community. The gnome community is not very large, it contains of enthusiasts from the Open Source movement. To understand what planet gnome is we have to first describe, that the communities organizes themself around blogs. There are some Gnome related blogs in the internet. And planet gnome aggregates these blogs. It is done with human moderators who are reading the blogs and decide which of the newly posted content is feed into the planet gnome RSS feed.

The result is amazing. The reader doesn't need to read the individual gnome blogs, but can follow only the planet gnome website. If he would like to read the entire article he clicks on the link and gets redirected to the original content. The concept is similar to what Google news is about: it's a list of headlines which are combined under a unique environment.

Sure, not everybody is interested in Gnome-related information. But how this community aggregates information into a meta blog can be seen as a best practice method which can be transfered to other domains as well. In the guidelines it is described who exactly an individual blog can become a member of planet gnome. He has to contact the moderators and provide the URL of the blog, his name, a link to the feed and the blog has to fulfill the community rules.

In German language the same concept is used for the https://planet.ubuntuusers.de/ website, which is also an aggregator for single blogs. The idea is, that around 50-100 individual blogs are providing new content and the higher instance called planet ubuntuusers creates the headlines for the content. That means, the moderation and the content creation is distributed. This makes it easy to cancel an existing blog from the list and to add new source of information.

The planet gnome website provides an about section. Quote:

“Planet GNOME plays an important role inside the GNOME community. It is so successful that very few people could imagine GNOME without the Planet now.”
1

https://wiki.gnome.org/PlanetGnome

An academic paper about planet gnome is also available: .

quote: “To select the communities based on the above criteria, we analyzed Planet a popular blog aggregator. Planet is used by 43 open source communities to collect blog posts of their members in a single place.” page 4 Dennis Pagano and Walid Maalej. How do open source communities blog? Empirical Software Engineering, 18(6):1090–1124, 2013.

The reason why planet gnome and planet eclipse have this name is because of the software which is called Planet

https://en.wikipedia.org/wiki/Planet_(software)

It is an RSS aggregator written in Python. A list of more blog aggregators with the planet software is available at https://news.ycombinator.com/item?id=4929490

2 Wikinews

2.1 How to clone Wikinews?

The most mature project in the Internet with a wiki for curating news article is Wikinews. The project was described in an earlier blogposts in depth and this time i would like to explain how to clone the idea. The idea is to start a wikinews website from scratch.

1. mediawiki as a wiki engine

2. RSS feedparser: a python library is able to read an XML feed

3. autoposter: the RSS inputfeed is converted into markdown syntax and is posted into the social network

4. pre-moderation: after pressing the submit button the posting of a user is put into an incoming queue, all the messages have to manually approve

These four elements are enough to build a fully working social network. It gets filled automatically with new URLs from the autoposter. Additionally external users can create an account and submit a post. The post is queued in a filter and the admin of the wiki has to manually approve each single posting. Then the human posting is visible in the normal wikifrontend.

Such a social network isn't asking the user for creating content. All what the user has to do is to post a URL, and additionally he can a add a short description. And he can comment postings from other users. This allows a continuously communication flow similar to a Facebook group.

All actions in the wiki are logged by the mediawiki version history. The user can track that his posting is pushed to the queue. And he sees, if the posting was published. If the admin likes to ban a user, he can do so with the mediawiki toolset.

What will happen for sure in this wiki is, that some of the users are writing autoposter like tools. Either because they have direct access to the API or because they are familiar with the AutoIt programming language which can scrap the information of browser and autofill the forms. An autoposter is a for-do-loop which is posting 10 messages a day into a social network.

It is important to know, that autoposter in context of social networks are not the exception but widespread used. If somebody is famiilar with Facebook groups he will like the idea as well to do the same against a social wiki. The described setup from the beginning doesn't provents autoposting. A newly posted message will put to the incoming queue and is released manually. If the post was generated by a human or a bot, can't be verified. Especially if the posting was submitted by an unknown IP adress and contains only of a URL it can be submitted by a or human or bot. It is not possible to ban bots from sending something to the website.

2.2 Analyzing the Wikinews recent changes

The Wikinews project is working on top of the mediawiki engine. The SQL dump is available online but it contains of many seperated table which are referencing to each other. REading through the SQL dump is a bit hard, the better idea is to use the tools which are integrated in the web GUI. I've found two important buttons so far: Recent changes and logs. The recent changes doesn't show all actions. They can be made visible by open the logs menu, in which additional actions are shown.

The good news is that the events can be ordered in a chronological way. That means if a user creates a page, the exact timecode is shown in the log file. This makes it more easier to trace the different actions back and search for regular patterns. Perhaps a simple example would help:

Today, a user under a IP has created early in the morning a new page. The action is shown in the logfile. 30 minutes later on the same day, and admin has blocked the user. with the comment advertisment/spam. One minute later the admin also deleted the new created page of the ip adress.

According to the log file, the total amount of such actions and reactions is small. On this morning it was the only creation and deletion workflow. If we are scolling in the timeline we will notice similar patterns. From time to time a new user creates a page and around 60 minutes later the page gets deleted and the user is blocked by the admin. This seems to be a normal pattern in the timeline. Unfurtunatly it is not possible to open the deleted page and see what the user has put in the article. Somewhere in the system this information is stored, but it can be done from the event log itself. The only way to get more details is to live monitor the process before a newly created page was deleted. That means, we have to wait until a page was created and before the admin can press the delete button we have to open the page in under 30 minutes. The problem is that the action “create a new page” is rare on the Wikinews site. The amount of users who are doing so is tiny.

Somebody may argue why this detail analysis of the Wikinews event log make sense. Isn't it enough to investigate the normal wikipedia? No it is not. The admin behavior in the normal behavior is different. It is true, that in both cases the mediawiki is used, but the reason why and the moment when an admin press the delete button is working with a different pattern.

2.3 Why Wikinews is better than Facebook

Wikinews allows the user to promote their own political campaign. It is the perfect hub for advertising ideas and move a hidden agenda into the mainstream audience. This is realized by advanced rendering engines which are supporting embedded videos, audios, hypertext documents and external links to more information. The wikinews admins are marketing experts and will explain to the newbies how to use the engine the right way, so that the overall traffic will increase.

What Wikinews doesn't provide is a realistic description of the world. If somebody is interested in an neutral point he is maybe on the wrong website. The reason is, that Wikinews is frequented by too much of advertaisment which will prevent that a single point of view will dominate the debate. It is more a marketplace of jokes, cat photos and music videos which is attractive to ordinary phd students.

The perhaps most attractive feature of Wikinews is, that it can explain much better what social networking is. In contrast to a common misconception it is not about connecting people but about content aggregation. The means the user page of a Wikinews user is empty. He doesn't provide information about his age, cultural background or interests. It doesn't matter for the working of the social network. The only thing what rules are the posted URL. If these URL are the right one, the user gets upvotes. In contrast to Facebook, WIkinews is very open for bots. That are software programs who are curating playlists according to algorithms. They are helping the user to communicate and to find new websites which are amazing.

A formal procedure to register a Wikinews bot is not needed. It can be created inside WIkinews or by external frameworks. A best practice method in creating a Wikinews bot is the AutoIt software which is generating simulated keypress and mouse actions with in the webbrowser. Addititionally the normal textual API is also available but an AutoIt bot is the more elaborated tool.

2.4 Building a wiki based blog aggregator

The Wikinews project is super-advanced project, which shows in which direction social media has to go. Wikinews has only one small mistake. It is biased. What does that mean? Before a wiki-realized can be realized the admin have to invent some rules which make sure, that everything runs smoothly with the project. In case of Wikinews the basic rule is, that a newly created article needs at least two sources and both sources must be indexed in Google news.

Sure, a random stranger can create technically a wikinews article and inserts two blogposts: one from medium and the second from wordpress. But the wikinews admin will delete such posting quickly. That means, Wikinews trust Google news, but Wikinews is sceptical to the blogosphere.

There is nothing wrong with biased point of view. The wikinews project is able to define what is spam and what is not. But the resulting question is, if a different news hub is realistic which is working with a different bias.

Suppose the idea is to create a complement who does the project will look like? It would be nearly the same like Wikinews, only with the exception that newly created articles can contains 2 sources which are blogposts. The self-understanding of the media in Google news is, that they are able to describe the world. Google news doesn't indexes blogs and Wikinews doesn't accept blogs. To understand why does it matter it is important to analyze the workflow until an article gets published in a newspaper.

Is a normal blogger allowed to create a report and send this to the Guardian news paper or to any other media company listed in Google news? No they don't. The sources listed in Google news are closed systems. They are not individual blogs but they are business companies who want's to earn money with the content. Earning money is from the point of the companies a great idea, but on the other side there are the readers who are not interested in spending money for news article. The readers didn't want to pay for listening to radio, watching television or get access to newspapers. What the reader prefers is open access content. Which means, that the readers asks for a service which isn't provided by CNN, the guardian and other.

Let us go a step backward. Before a news hub can publish a story at least two sources are needed. And before a ressource can be linked, the ressource must be created. The cheapest way of creating content are weblogs. The blogosphere is free to read and free to write. The content in the blogosphere can be aggregated in a news hub. Technically a news hub is a social network and can be realized with the mediawiki system. The result is equal to Wikinews, but only with a different bias. This new bias prefers blogs under a creative commons license and doesn't references paid ressources which are protected by paywalls.

Perhaps one word about creative commons. The wikinews project itself has a creative commons. The content there is free to the world. But, the URLs at the end of the news article are directing the user to professional content taken from Google news. Content which is hosted at the Guardian newspaper or at CNET is not provided under a creative commons license.

The better idea is to reference only to sources which are creative commons too. In the current wikinews project this is not possible. Like i mentioned before if somebody creates a wikinews article and is referencing to two blogs, the article gets deleted by the admin. It makes no sense to argue with the admin or discuss the issue inside the WIkinews project, because this rule is fixed. The user has only the option to follow the rule or he can't participate in the Wikinews project at all. I don't think that Wikinews is broken, but there is a need for something which is more open to creative commons sources from the internet.

3 Misc

3.1 Definition of a social network

Social networks can be defined by it's formal structure. A small scale social network which is restricted to a single website looks like:

2018-03-04 [internalURL] comment

2018-03-04 [internalURL]

2018-03-05 [internalURL] comment

2018-03-07 [internalURL] comment

While a large scale social network which is open to the entire Internet looks like:

2018-03-04 [URL]

2018-03-04 [URL] comment

This is the most basic structure available which is used for all social networks. The basic structure can be extended by groups which makes the table more readable and with upvotes and reposting ability. For reason to explain things more clearly, only the basic structure is used which contains of “date, URL, comment”.

The most obvious property of social networks is, that not content is presented but it looks like a curated playlist. The URL is referencing to content stored elsewhere. The social network is the place in which this content is evaluated and made accessible. Social networks have much in common with a search engine. A search engine is also able to index existing content. The difference is, that a search engine is created by a machine and the only thing what a search engine can provide is a complete searchable index of all the content.

In contrast, a social network is curated by humans and sometimes by autoposting tools and it can look very different. What is important to know is, that social networks are not the same like a forum. A forum can work without posting any URL. In some forums it is forbidden to post URL, and the quality of the forum is high and very high. In social networks it makes no sense to forbid posting of URL because this is the core feature.

From a perspective of server software, social networks can be realized with a variate of programs. It is possible to build them with wikis, inside a normal forum software, in dedicated social network software or with the help of email messages. The most interesting part of social networks is, that the amount of work to posting a URL plus a short comment is low. This makes social networks interesting for a large amount of people. Copy&paste a URL and add a sentence can be done in under 10 seconds. As a consequence the amount of traffic in social networks is high and very high. That means, the amount of daily new posts is high and the amount of people how are doing so is high. If this posting activity is supported by autoposting software the physical amount of load on the server will gets the bottleneck.

Automatic peer review

Most people are unsure what peer review is and they have no idea what social networks are. Sure, there is somewhere a help section but the idea can be explained by an automatic teacher much better. The easiest way in doing so is to write a small computer which posts automatically URL into a social network. If the bot is not to active he gets tolerated by the admins.

Such a bot is generating a practical example what a useful post is. He shows that a posting in the form URL, comment makes sense. What the human users are asked for is to emulate the behavior of an autoposting bot. In most cases they can imitate the behavior much better. Let me give an example.

An autposter bot, is creating one post a day in an AI group. He posts links of the latest paper in the Arxiv directory. The bot itself is not very advanced but it is a good starting point. What the human can do is to work together with the bot. They can comment an existing post. This will help other people in the forum, or they can imitate the bahavior and post some links to papers not given in the Arxiv directory but which are useful too.

The autoposter works as a icebreaker and the humans can swim behind him and do the same. If the autoposter bot is more advanced the human will become more advanced as well. Another advantage of social networks autoposter bots is the manual around the software. Suppose the bot itself is deactivated but the manual who the inner working is remains available. Reading such manuals is very interesting because it explains explicit what a social network group is about. Let me give an example.

In most tutorials about Autoposting bots it is explained that the posting frequency should be low. A good value is to post not more than 2 messages a day. This hint makes sense for bots but for humans as well. That means, if a human is trying to post 30 messages a day the danger is high that he gets banned by the admin. Because this frequency sounds not natural. In contrast, if the human imitates the autoposting bot, everything is working fine.

bot1 is posting URLs from arxiv (one message a day). human1 is posting manual URLs from sciencedirect and human2 is imitating human1 but posts URLs from cnet. The result is a healthy looking social network group which provides an ideal environment for peer reviewing all the content. A stranger can write a comment because he doesn't understands why he should pay for the Elsevier paper while nearly the same content is available from arxiv. A second human reads the comment, has the same opinion and upvotes the last posting of the arxiv posting bot. and so on.

Now we are observing an interesting case. What will happen if the admin of the social network group is monitoring the situation and comes to the conclusion, that bot1 is not a human but a bot. He decides to delete his post. This deletes also the upvote of user2. As a result the user2 get's angry. That means, deactivating the bot1 is not the best idea the admin can have. Or to explain the situation the other way around. The best idea to manipulate a social network is to deactivate some of the bots.

3.2 Academic peer review with social networks

In the Open Science movement there is a big problem left open called peer review. Peer review is something which is done after a paper was created. It is a judgment about the quality of a pdf file. A problem which is solved in the community is how to put a paper online. It is possible to upload an academic paper to a variety of hosting websites, put it into the own blog or copy it to the github repository. From a technical point of view the pdf file is available under a worldclass URL and the content can be downloaded worldwide. If the paper has a creative commons license the distribution is made more easier and fulfills the open access guidelines great.

The only problem is how to get traffic to the file, how to get readers for the content. Academic papers have the general problem that nobody else apart the author is motivated to take a look into it. Especially if it's a paper not peer reviewed before. The reason is, that if a paper wasn't peer reviewed it won't be visible at Google Scholar, and if the paper isn't searchable in Google scholar other authors can't find it and they won't cite it.

Overcoming this hen egg problem is easy and it is called social networks plus autoposting software. The idea is to post the URL to the paper to a social network for example Google+ and then wait what happens next. Somebody who is not familiar will ask back, that nothing will happen. Because the Google+ group is empty, has no readers and nobody will click on the link. That is technically not correct. A not often menioned feature of social networks is, that they are heavily populated by posting bots and by moderation bots. These bots are producing the standard noise in the group.

Let me give an example. Bot1 is used by the original author to post the URL of his papers to a Google+ group. Bot2 was was installed by the admin of the group to determine if a spam post was send. Bot2 will go through every post, opens the URL, follows the link and check if the content is spam or not. If it's spam then the post gets flagged for further investigation. Bot3 was programmed by a third party user who likes to generate some traffic in the group. Bot3 is selecting a random paper and press the upvote button.

What i want to explain is, that if somebody posts a URL to a social network he will activate a cascade of preprogrammed bots who are increasing the traffic. On top of the bot traffic the human generated traffic will follow, for example the group admin has to check the flagged paper by himself and this is counted as new page view. That means, the overall system has a tendency to do exactly what the purpose is. To make a URL more visible in the internet and to allow a peer review process.

If we are taking a look into existing social network we will find that not all papers gets commented but some of them. It is hard to predict how much comments and how much traffic a paper gets, but it is sure, that a social network is the place in which academic papers gets read and gets a peer review.

Cynical experts who are familiar with the do's and dont's will anticipate what comes next. Right, the fully bot controlled academic peer review. Sci-gen (Greetings to Jeremy Stribling) generates the pdf file and uploads it to a blog. A google+ autoposter copies the URL to a group. The Google+ auto moderator bot will open the paper to check if it's valid. A random stranger in the group finds the post also relevant and will press the share button. At the end the sci-gen paper has become 10 comments, 2 likes and the top position in the Wikinews headline of the day.

In the internet I've found a reddit post from the past

https://www.reddit.com/r/IAmA/comments/32l0ym/at_mit_we_created_scigen_which_generates/

In which the original Jeremey STribling has posted a link to his famous “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy” paper. The paper was given by the URL to the pdf, and the assumption is high, that lot of people have clicked onto that link. To be fair, in the post it was mentioned the scigen project, so i would guess the post was an experiment how to increase the traffic of the well known rooter paper. And it seems, that the users in the social network liked the post very much. The upvote counter is at 191 points.

What Jeremy Stribling and his co-authors didn't have made is to post their paper to Wikinews. This kind of experiment would be a bit more advanced because Wikinews is powerful enough to make the Reddit hub obsolete. The problem with Reddit is, that on the right side in the sidebar a large advertaisment is visible and that the unverlying rendering engine is not available as open source.

3.3 Setting up a social network from scratch with mediawiki

Before i'd like to explain who to moderate a social network a more common example of using a wiki is explained. This kind of wiki is called content wiki because the idea is, that the users are creating textpages. Wikipedia is good example for a common wiki, but it is possible to setup a wikipedia clone from scratch.

The good news is, that such a system is highly predictable. The amount of traffic on such a website will become zero. The reason is, that a fresh installed mediawiki doesn't contain any information. If a random stranger from the internet has discovered the page recently he will recognize fast, the he or she has to create a longer artikel (50 kb well formatted text) to fill the wiki with information. This kind of task takes many hours and nobody will do so. That means, even if the wiki is for free and anybody can create a user account it is for sure, that the total amount of articles after one year remains at 0.

Even the large Wikipedia project which is a success story has problem to find volunteers who want to write articles or improve existing content. The reason is that the Wikipedia internal conflict escalation is dreaded. The only thing what the world likes is to read content which is already there. The number of readers of WIkipedia is approximately 7 billion people, while the number of authors is below 20k. In a fresh mediawiki the situation would become the same. The only author will be the admin who uploads new articles and with a bit luck he finds some readers, but he will never find new authors.

The description of content wiki was only given as an introduction and now comes the more interesting part which is a about a social wiki. A social wiki has a different guideline. The idea is not that content is created, but the user is asked to submit a URL and maybe a comment to the submitted URLs of other. Because the task of posting an URL to a self-selected website makes a lot of fun, the prediction is that after a short delay, some users will test the wiki out and become a user. They will create a fake account and post some links to the wiki because this increases their backlink score. Other users will write an autoposter for the wiki to fill the category not with a single backlink but with a list of 100 of them to try out under which condition the admin will become angry.

In contrast to a content wiki which was described in the introduction, the users won't stay away from the wiki but they will try to overtake the system and present their own marketing information in the system. The prediction is that such a system becomes chaotic and will generate a lot of traffic. So the interesting question is how to moderate a social wiki?

Let us go a step back to a content wiki. If the amount of users in a system is low and the admin is the only registered user, the spam is protected against all sort of spam. If no stranger is interested to create an account he can't post anything, that means the wiki will operate like a blog. If the admin posts something it is visible, but nobody else will do so because wiki are not attractive for the audience. If no postings are in the system, the moderation task is simple. That means, the admin won't get the problem that somebody is flooding the system and he didn't has to flag edits as spam.

In a social wiki the situation is the opposite. A social wiki is perceived as an attractive hub, which means that a lot user are creating an account and they will post URLs as much as they can. What can the admin do against this behavior? If he defines the wiki as a social wiki and asks the users to post URLs, then he can't say that this kind of posting is spam. If the guideline says:

Everybody can create an account, every user can post a URL, commenting existing URLs is desired.

Then this guidelines has to be followed by the admin too. To make the things more easier the short and quick answer of moderating a social wiki is to use a combination of pre-moderation and auto-moderation. This allows to handle URLs shortposts.

Pre-moderation means, that incoming messages are not put online but cached in a queue. The admin goes once a day through the queue and marks desired posts as valid. The the URL post is put online. If the admin doesn't monitor the queue, no posting gets published and the wiki remains cleaned up. The user can login, and the can submit URLs, but it is not shown on the website.

The other technique is technically more advanced. The term autoposting is well known from the Facebook world. Automoderation means the opposite. Automated moderation bots are checking incoming posts with a rule and decides by it's own if the post is spam, gets an upvote or should be answered. The best practice method is to install in the first step a pre-moderation system in which the admin has to release manual the postings, and in step 2 an automoderation bot is used to simplify this procedure and release normal posts automatically.

All what the admin has to do is to monitor his own auto-moderation bot and if the bot makes a mistake it is put offline and all the posts are blocked by the queue. With such a pipeline a social wiki is well protected against spam messages.

Botwar

Suppoe a social wiki is running great. The average user takes advantage of an autoposting tool which posts ones a day a URL from the list to the group. And at the same time, the admin is using an automoderation tool which is trying to identify spam. The moderation bot thinks, that the URL post is valid so all the messages can pass.

At a nice day in the summer, a new human user creates an account and posts his first message. He didn't post a URL but a longer text, because he thinks this is the correct behavior. The moderation who was trained to detect spam doesn't understand the fulltext and flags the posting as spam. That means the Autoposter bot can drop their URL to the social network, but the human is not allowed to post a well written text.

Is this the future? Is this already happen in the social networks? We don't know. But one thing is sure, humans are not the driving force behind all the traffic in the internet. In most cases the bots are alone.

3.4 Is Python the right choice for creating a dynamic website?

A webframework is at foremost a programming task, similar to creating a game or an office application. It contains of two steps, first the prototype is written and secondly, the productive code is created. Python is the best language for writing a prototype. This is especially true for a web framework. Compared to other languages like C++, PHP or Java, Python is less restrictive in case of syntax and provides more high level commands.

After the Python code was written the web framework can be tested in the intranet. The user has to check the basic features like login, type in text and update the sql backend. If the prototype passed the minimum standard it can be converted into C++. Why C++? Because C++ is the worldbest language for writing a production ready program. It compiles to fast machine language, works for all plattforms and is extremely fast for multithreading tasks. Compared to PHP or Java, C++ is upto 5x faster and the language is an open standard.

Converting an existing Python webframework into a C++ one is easier than it looks like. The good news is, that the only problem is to create the C++ code itself. That means to handle the pointers, to initialize the classes and to optimize the performance. What the software is doing and how the optical design looks like was fixed by the Python prototype.

3.5 reinforcing vs inhibiting cascade in the peer review proccess

Academic publishing contains of two stages: content production and content evaluation. Content production is a single user activity. A human sits in front a computer and types in a manuscript. Then he uploads the pdf document to a hosting service and starts with the next paper.

Peer review is the opposite from content production. It is working in a group only and has the aim to produce a resistance against individuals which is called mobbing. If a group comes to the conclusion that a paper of an individual is wrong this is equal to a conspiracy against the individual.

Sometimes, Facebook and other social networks are called mobbing infrastructure. Their main goal is hold other people down and to make jokes about the loosers outside the own circle. This behavior is the normal result. If a group doesn't produce mobbing and holds other people down, something is wrong with the group.

From a cybernetics perspective the combination of reinforcing the authors to create more content and prevent them from doing so is very attractive to research in detail. The best practice method in realizing such a cascade is by separate both instances. What does this mean? It means, that a paper can be available in the internet and it doesn't have become a peer review. A scientific document can be located on the Web but the researcher is not part of a larger group. This is realized by a divided infrastructure. The first type of websites is only created for file hosting. A normal weblog or dedicated fileservers are used with this purpose. If somebody has an account on the website he is allowed to upload his document. Then it is available on a worldwide URL.

The second sort of website is called social network. These websites are working independent from fileservers. The task of a social network is to produce a community. If the group members are communicating with each other the social network was successful.

The existing question is how to connect both instances. Building the first type of technology is easy. A file can be hosted in the internet with a normal Apache webserver. If the file is copied into the /var/www directory it is available worldwide. The technology is standardized and it very cheap to build such systems.

In contrast the problem of building a social network is much harder. The amount of example is low, the software is not standardized and there are different opinion out there what a social network is. What we can say, is that the Facebook website is attractive for many million of people. But if Facebook is the right choice for doing a peer review is unclear.

The question is how does a social network look like so that it is fulfill the needs of scientists? The good news is, that all the social networks have something in common. There are working with a special kind of software Facebook is based on dynamic generated website and Google+ is operating with the same feature. So it can be measured in detail what the software is doing and how the people interact with each other. In theory, this allows to say how the perfect social network will look like.

My personal work hypothesis is, that the perfect social network is equal to Wikinews. The RSS feed of existing papers are feed into the Wikinews system and there it's get evaluated by the community. For doing so, the content has to be converted from a paper into a news article. Sometimes this is called academic story telling. But the inner working can be explained more precise. The idea is, that the community of authors who have written a paper have an interest that their content is aggregated and annotated with comments. This is equal to group building. A media wiki installation in which the users are posting URLs to existing content is realizing such ideal.

The hypothesis has a lot of weakness. The major one is, that it wasn't tested in reality. There is no case known in which a wiki system was used as a social network to evaulating academic content. A case which comes close to the idea is Wikinews itself, but they are evaluating only a small amount of content and mostly not academic papers. The idea of posting the URL of academic papers to a Facebook group was made from time to time in the Internet but wasn't researched in detail. The problem is, that Facebook is working different from Wikinews. An edit in Facebook is not recoreded for later investigation and the software is propriatary.

The only available project which is researched heavily is WIkipedia. But, Wikipedia is not a social network but it is a content wiki. That means the content is created in the Wiki itself. As a real life example we have only the following websites:

- Wikipedia

- Wikinews

- academic Facebook groups

Combining the idea of all would result into a Wikinews for Academic papers. The pdf files are stored outside the wiki and in the Wiki only URLs plus comments are stored.

3.6 Creating a link posting bot but why?

From a technical point of view it is easy to create a python script which is posting URLs to twitter, to Facebook, to Google+ or to Wikinews articles. All what the bot has to do is to print the URL from a table into the form and press the submit button. If this action is repeated in a for-loop and a simple delay pause was used, the bot will run 24/7 without interruption.

The more interesting question is, why does somebody should do so? What is the purpose of posting random URL to Wikinews or into a Facebook group. Somebody may argue, that there is no meaning and the only reason is because it is a bot and bots doing so. Perhaps the overall script is useless at all? The answer is bit more complicated, because it won't explain why so many people are fascinated by twitter bots who a are posting images to the internet.

The answer is not located within the individual bot but how social networks are working. We have to abstract from the motivation of a single user and describe the motivation of larger groups. What is the reason why blog aggregators like planet gnome or CSL-theory feed was founded? It was not because of a single user was interested but because the community of all bloggers have asked for uniting the content. If a community of 100 bloggers have produced a large amount of content they are interested in two things: first they would like if the public is visiting their websites and secondly, they hope that the bloggers are commenting each other under the articles. In a short sentence a blogging community has a need for a social network which unites the community.

The surprising fact is, that twitter bots and blog aggregators are fulfill this need very well. Let me construct an example. Suppose there are 100 blogs available about the subject of robotics. Each of the blogs was created by a different author. But the bloggers are not talking to each other and the public isn't interested in the project very much. Now a twitter bot is created. The twitter bot is doing the following. At first, he generates a combined RSS feed from all the blogs, and then he is posting the latest URL into his bot-account. The bot account is called “robotics united”.

Now we have the situation that in a twitter account a bot is posting each day a status update of all the 100 blogs. If one of the blog users likes to see what the other blogs are doing he doesn't need to visit all the blogs, but he can follow the twitter bot and is always informed if in the community a new article was published.

Let us back to the introduction. The initial question was why somebody creates a bot, or why a bot should post a URL list to a twitter account. It was not possible this behavior on an individual basis, the only explanation was that it is spam and the bot should be deactivated. The better attempt to answer the question was to increase the abstraction level to the entire blog community which contains of more than a single person. In this context the twitter bot is producing sense. Even if the python script itself is useless and can be created in under 20 lines of code, the behavior of twitter bot is equal to community building. That means, his posting activity results into a united robotics community. They have a single twitter account in the internet who is well informed about the updates, and the public can communicate with the twitter bot. The bot itself won't answer, but one of the individual bloggers will do so.

Spam or not?

By definition spam is everything which can be deleted without problems. In case of content websites the seperation between spam and non-spam is easy. If information is replicated and contains of random information then it is spam. This definition doesn't work for social networks. By definition, social networks and twitter microblogging service have not the task to deliver content but to improve communication and allow group building. The amount of added information in a content aggregator is low and even ultralow. All the eisting content is put in a playlist, but the playlist doesn't contain new content. The same is true for counted upvotes and traffic measurements. These non-content related information are the backbone of social networks.

With the classical definition social networks at all are equal to spam and should be deleted. A twitter bot how is posting day by day URL doesn't add value. The URLs are known without the bot and the content can be find with a search engine. The same is true for most Facebook groups. What would happen if we treat social networks and content aggregation at all as spam? It would note make much sense, except the idea is to prevent the people from connecting to each other. The better definition is, to assume that spam behavior is only clear for content hosting websites, but is unsolved for social networks.

The problem will become more obvious in the future. If social networks are treated as normal and useful while at the same time the capabilities of AI-bot will increase twitter bots and content aggregators will become an important part of social networks. In some facebook groups this behavior is seen today. From a pessimistic point of view, a Autoposter bot is the most valuable part of a group discussion because he never downvotes users and his behavior is predictable.

But how exactly is the term bot defined for social networks? It is naiv to describe a bot by it's sourcecode. Even if the source code is known this is not equal to the bot. The meaning in a social network is defined by the group. And groups have a certain demand for group building. The question is, if social bot supports this demand or not.

One explanation is that the importance of social networks and twitter posting bots is exaggerated and in reality a group would work without a social network great. To investigate this question in detail it make sense to introduce a well known problem in Academia, called peer review. Somebody may ask what does have peer review to do with social networks. The explanation is, that peer review and content creation is separated. Peer review is similar to curating content which means it is working outside the content creation mode.

Let me give an example. There are 100 scientists available, each of them writes a paper by it's own and uploads the content into the internet. The overall task can be defined as content creation. And the place in which the final pdf file is stored is a document hoster. The interesting question is how has the peer review organized to evulate the content? In the classical world in which no social network exists a peer review is not possible. Peer review is by definition a communicating process between colleagues. If each of the scientists has written his paper alone no group discussion was there.

Let us define what the result of peer review is on a technical level? Is it similar to creating another 100 papers? Now this task is called content creation. Peer review is about sending existing content back and forth, measuring the pageviews and count the up and downvotes. That means, the peer review process doesn'T add aditional content but is equal to group building. Now we can answer the question how exactly peer review is done. Peer review is always located in a social network. A social network is the place in which traffic is generated, likes are counted and one sentence comments were written.

Peer review is working on the meta level. It contains of the following steps. The first one is content aggregation. Which means all the URLs of the pdf papers are combined into a single RSS feed, similar to what is known from the Arxiv feed which available for all subsections of the repository. The second step is commenting the content. Not the paper itself are annotated but the RSS feed. The commenting step is done in the social network software, for example in a Twitter group. A URL posting bot announces the papers, and the users can comment each paper. The last step is to summarize the data of the social network. It is measured how many traffic each paper has reached and how many comments were posted. This allows to publish a ranking. The top paper has reached the most upvotes. The extra effect of the peer review process which is done in the social network is, that the public gets informed. They can follow the bot and they will recognize that a subgroup has made an open peer review.

3.7 Python script for creating fake news

for i in range(4):
  runscigen() # SCIgen nonsense generator
  uploadpaper() # it is stored at Academia.edu repository
  RSStomarkdown() # takes the RSS feed as input and
                  # produces an article
  checkarticle()  # the article contains of: title, two references, plus abstract
  autoposterbot() # posts the article to Wikinews
  wikinewspeerreview() # checks if the article is right
  wikinewspublish()  # put the headline to the mainpage

3.8 Best practice method for an intranet

From the infrastructure itself, an intranet needs lan cable, webserver and client pc. If all the wires are connected the next question is how the content is created and how the users are working with the content.

Content creation can be realized with the well known industry standards like Microsoft sharepoint, linux webservers, wikis, blogs, forums, network file systems, some external hosting services, an outdated FTP server and additional is the employee allowed to install an adhoc webserver on his “bring your own device”. As a result, the intranet contains of a variety of places in which data are stored and no one known them all.

The next step is to install an intranet fulltext search engine. This kind of tool has a high demand for storage capacity and software features. The robot of the engine will traverse all the servers in the network, and copies it's file to a centralized database. A easy to use interface allows the user to type in a keyword and he gets in under second a list of URLs from the Intranet.

The last, and maybe most important step in creating an intranet is called social intranet. A social intranet is human powered content aggregation system. Technically it is realized with a mediawiki server. The user gets an account on the system and are allowed to post URLs. But only internal URL from the own intranet are desired not links to youtube videos or audiostreams in the normal internet, because the intranet should support the work, and if the employees are watching all the reaggae songs on youtube they will become too relaxed. Addiitonal some RSS feeds are curated by the admins of the social wiki to announce importart URLs from the intranet to the newbies.

Because it's an intranet a premoderation is not needed, instead the user types in a new URL press submit and the edit is online. It is not allowed to post fulltext information into the social wiki. If somebody has written a pdf document he has to put it to a fileserver and then he inserts the link into the social wiki.

Let us summarize the overall setup. On the lowest level there is the hardware which contains of desktop pc, servers and lan cable. On layer 1 there are storage capabilities like blogs, wikis, forum, sharepoint and fileservers. Layer 2 is a fulltext search engine of the entire intranet and on layer 3 a social network is available realized with the mediawiki engine.

Social wiki

Somebody may ask why the users of an intranet needs a dedicated social wiki. Isn't enough if they are posting content to the normal blogs and the normal wikis, which are already there? If somebody want's to write a text he can do, because it gets indexed from the search engine for sure. And other people can search for the document and read it.

The answer is, that the users in an intranet will need a dedicated social wiki for sure. NOt because of the users itself, but because of the content which is stored in the intranet. In layer from the content hosting all the documents are available. Each of them has a URL. And some users will bookmark the URL in their local browser. But, these bookmarks are not visible for a group. What the content needs is called aggregation. That means, a playlist is curated and the playlist is posted to the social wiki. Not because somebody has a concrete question, but because he found in the intranet something which looks interesting and he is dropping the link in the social wiki group.

The combination of a fulltext search engine plus a curated playlist of URLs is a powerful tool for knowledge management in an Intranet. It is predictable that the social wiki will become the most frequent used website in the intranet. Some users will use it more often than even the fulltext search engine. additionally, the traffic of the linked ressources will become much higher if a link is posted into the social wiki. The reason is, that the other team members want's to know, what they should know. Or to make it more clear. If a person explain to another person that he should take a look into a document, the other person will do so. In contrast, if a search engine result list will explain to the user, that one of the documents is very important the user will ignore the advice without consequences. Because a search engine is a machine. And the communication flow is the other way around.

Somebody may ask why the URLs are stored in a wiki. Wouldn't it be better to use a textfile or a forum for such task? No it is not. because a handcurated list of URL can be extended to a newspaper, and creating a newspaper is more easier with the markdown syntax. It is possible to reference to images and format the text a bit.

Such an intranet newspaper will be more attractive than hard to read URL lists. This would result into a more efficient intranet. The social newspaper is somekind of meta-meta section. In the social wiki all the URLs are stored from the user, while in the newspaper the social wiki gets summarized.

User will like the social wiki

A common misunderstanding of wikis and content management systems is, that the ordinary users doesn't understand it. He knows, that the company has somewhere such a system but the user has never used it. So why should the user be motivated to become a user of the social wiki?

The reason is simple, because the social wiki doesn't ask the user to do complicated task. All what the user has to do in the wiki is to post a URL. If he likes he can add a small comment but this is optional. More is not needed. If the user is already logged in into the wiki he can do such a task in under 20 seconds. Most users didn't enter the URL manually but are using the magical STRG+V. Because this task is so easy the users will do so very often. And it will be more than 1-2 users who are specialists for wikis, but the wiki will reach 50% of the users of the intranet.

Let us analyze the user interaction with a social wiki in detail. From a formal perspective the user inserts an URL and submits the edit to the wiki. Then the edit is online. From a higher perspective, the user is communicating with other users. The question is what do you think about a certain topic, do we have the same opinion and if not why? From a formal point of view, the users are posting URLs. But what they are really doing is group building. They are aggregating existing knowledge and existing content into a conversation and this is attractive to other people not involved in the debate already.

Every social wiki which is installed in the intranet of a company will quickly become the topwebsite. On day 1 only one person has an account after a month 10% have an account and after 6 month, 300% have an account which means, the users are trying to trick the admin with sockpuppets and they will start to manipulate the traffic counter.

3.9 Evolution of bookmarking tools

A bookmarking tool isn't about the content of a website but about his location in the internet. It is similar to a library cataloque which doesn't provides books but index cards. The most easy to grasp bookmarking tools are directories in the webbrowser. Each user has it's own bookmarking list. If this idea is made more powerful the URL collections are managed in the internet collaborative.

One option for doing so are blog aggregators. A famous example from within the Python community is planet phython. Planet python is working with the RSS standard. A general website which isn't online anymore was technorati. This hub organized bookmarks from very different topics. The next more advanced technique to organize bookmarks are social networks. The so called share-button creates a bookmark in a public browsable group.

Social networks are only one step in the ongoing ladder to find the perfect collaboration tool. The evolutionary next step after a social network is a wiki based social network for example wikinews. The users are submitting bookmarks to wikinews and other users are buliding on top of the bookmarks a story. In contrast to propriatary social networks, the version control history shows to everyone what the admins are doing behind the scene. This is important in case of deleted content and reverted actions.

Even WIkinews can be improved much more. What the today's wikinews doesn't provide is a bot-friendly environment. From a technical point of view it is possible to submit a bookmark with a script. And another computer program can evaluate the URL. A simple example for a incoming control bot is a python script which asks, if a submitted URL is available in Google news already. If not, then the source is not serious.

Somebody may ask why bookmarking tools are important if all the available content is stored in a fulltext engine. The problem with fulltext indicies is, that they are not providing sense. What is missing is a ranking, which of the content is valuable and which not. This can be answered by personal recommendations, amount of links to a ressource, traffic analysis, number of comments and so on.

In the today's landscape, the Google search engine is trying to integrate some of these annotations in the search engine. What Google can't provide are real humans who are clicking on the links. The ability of an automated search engine is limited. There is a need to extend search engines with social networks.