May 07, 2019

Wikis for Open Access publishing


1 Mediawiki as a Reddit replacement

In the previous section it was explained was Reddit is, how the planet software aggregates RSS feeds and what the idea of a blog aggregator is. Now it is time to combine this knowledge and describe a future ready news aggregator based on the mediawiki system. Mediawiki was original invented to create content. The famous Wikipedia and WIkinews project are doing so. But Mediawiki can be used for creating smaller snippets as well.
The advantage of a wiki over a blog is, that a wiki allows to edit the same page by many hundred users at the same time. All what is missing is the right tutorial. I would call the concept a “wiki based reddit clone”. The idea is to create a blog aggregagator for monitoring the existing content outside the wiki. The users are asked to post small URL links to the wiki and the admin is in charge to avoid spam.
The advantage over the planet software is, that the news feed is not created automatically, but with humans. Let us go into the details. The first thing to do is to set up a fresh media wiki installation. Then, around 5 users can login into the wiki. Their task is to create a Metablog. The linkblog is monitoring a certain topic, for example the Lego Mindstorms community. If one of the 5 users found an interesting link which is new and sounds interesting he posts the link plus a short description to the wiki.
He is doing the same what Reddit users are doing. As a result, a chronological timeline of URLs is shown in the wiki. The timeline can be seperated into subreddit (oh pardon, into sub-mediawiki groups). This allows to monitor different topics at the same time.
And now comes the clou. Such a wiki will become very useful for the community. Because it connects all the single sources which are already there. The blogger how have created all the Lego Mindstorms posts likes the idea, if their posting is linked in the wiki, and the reader will subscribe the wiki too, because they see under one website all the relevant content. The wiki is not able to replace existing blogs, but it is acting as a higher authority. Which generates traffic and combines the strength of single blogs. A nice sidefeature of using the mediawiki technology is, that it will be very easy to control the content. A dedicated admin can monitor what the users are doing and ban users who are posting spam links.
Perhaps some number would make the idea easier to understand. Suppose there are 100 blogs available about Lego Mindstorms. One wiki is created as a blog aggregator. In the wiki 5 normal users plus one admin is registered. The 5 users are scanning the 100 blogs manual with google or by subscribing the rss feed and post intersting links to the wiki. The Wiki admin is monitoring what the 5 users are doing and corrects small spellling mistakes in the posted URL description. The resulting wiki will act as an in between instance for coordinating all the 100 blogs.
I'm not sure if the concept has a name, i would call it a mediawiki based blog aggregator with the aim to replace Reddit. Such a system would not only provide the URLs but it would also make the admin's behavior transparent. That means, if a certain URL gets deleted the public can track the reason why. This makes the overall system more reliable, than the original reddit site which is a bit odd in case of the moderation policy.
Content is separated from headlines
A community contains of two parts. At first there is a place for creating the content itself. The best technology for doing so is an individual blog. The blog can be hosted somewhere in the internet by any blogging software. One example is wordpress, but there are many alternatives available for example cppcms which is written in C++ and is much more efficient.
The second question to answer is how to aggregate the content of the blogs. A naive approach is to belief that the magical google robot will traverse the blogs by it's own and indexes the content into the worldwide searchengine. Technically, this is what a google bot is doing but it's not enough to build a community around these blog. What the content needs is a higher instance which acts as a gatekeeper to the outside world. This is done in a dedicated blog aggregator. Other terms for the same idea are linkblog or metablog. The classical way in realizing such a metablog is the planet software but a mediawiki installation can do the job much better. In contrast to invidiual blogs which are created only by a single author, the authority content aggregator has to be created by a group of people. The group is monitoring the individual blogs, and posts the url links to the wiki. The wiki is the frontend to the individual blogs. The effect are twofold. The outside world has to read only the wiki and is redirected to the individual blogs. And secondly, the individual blogs will get a higher authoritative structure which is equal to become part of a larger group.

2 A wiki for collecting links

Producing content was never the problem in the internet. For each topic, thousands of blogs are available. Finding the information and aggregating the content to knowledge is the real challenge. A naive approach from the past was to declare blogs as obsolete and replace them with wikis. It is possible to create a wiki about artificial intelligence and collect all the information in that wiki. If somebody tries to add new content he has to create a new wiki subpage.
The problem with topic based wikis is, that on a theoretical level the idea sounds interesting but in reality most of these wiki projects have only a low number of users and content. It seems, that the same people like to post content into their own blog but avoid to post content to wiki like structures. This problem can't be overcome with better training or tutorials but the bloggers are right. Because the problem with a wiki structure is, that if somebody posts something, a the group will argue against it, so that the individual has to provide good reasons for posting anything. As a result, the author is not motivated to fight against the admins of a wiki and resigns the idea in general to contribute to a wiki.
Wiki and mediawiki in special are powerful tool. They have compared to blogs many advantages but wikis have to used wisely. Their strength that they can filter information, their weakness is, that they surpressing the creation of new content. But what is, if blogs and wikis are combined? Suppose the following situation. There are 100 blogs available which are working with wordpress like technology. And there is one filter wiki available which aggregates the content. The wiki acts similar to Reddit as a frontpage to the individual blogs. It is not allowed to post longer text to the wiki but only provide a url plus a short description in a timeline. And the decision which url gets posted is made by a group in an escalating way. This prevents that spam urls are posted into the wiki.

Figure 1: wiki as frontpage
The wiki doesn't contains content but it provides only links to the blogosphere. If new blogs will be added and old blogs are canceled, the frontpage wiki remains the same. The wiki is equal to the community. It acts as an content aggregator to the public and as a structure for the bloggers. It allows the invididual blogger to remain independent. That means, he has it's own blog, his own template, his own admin rules. And at the same time, recent posts are linked by the wiki next to blog posts from other blogs.
From a technical perspective the wiki is equal to a table which holds URLs plus a description to articles in the blog. Similar to what the well known Reddit website is doing. The wiki is editing by a group of people at the same time. They have to decide, which url should be added to the wiki and which not. The main idea is to keep the aggregator and the content seperated. That means, the blogs can be written and read without asking the wiki and the wiki doesn't need new content because the wiki can decide by it's own which blogs are hot.
A similar idea was realized under the name dmoz some years ago. The idea of dmoz was to build a hierarchical catalog of websites, for exampple the category computing is divided into programming, hardware and operating systems. The problem with such approach is, that it is not working on the article level. The better approach is used in the Reddit model which is a time based news aggregator. It is collecting all the blog posts from the last week.
Pros and cons of wikis
Wikis are perceived as a complicated and powerful tool at the same time. They are described as not wanted if somebody is interested to post new content. Bringing an article into a wiki is a long duration project which fails quite often. Most user give up after the first attempt and they have understand it right. Wikis are not created with the idea to make the life for the authors more easier but more complicate.
On the same time, Wikis are a great tool if collaboration is needed. No other software allows to coordinate hundred and more users at the same time. A wiki side contains only one possible status and this status is the current one. All the editors have to discuss about this desired state. This is done in the version history, in the talk section and in other discussion groups.
It is important to know the strength and weakness of a wiki to use it for the right purpose. Creating content inside a wiki is an antipattern. Because the barrier of posting new information into the wiki is too high. Wikis are designed for evaluating content. Content evaluation is done in a group. Let us observe what will happen if 5 users are trying to argue against each other if a wiki should contain a URL to an external blog or not. Person 1 thinks the links is useful and has posted it. This was done quickly in under a minute. Person 2 doesn't like the behavior of Person 1 and has pressed the undo button. Person 3 has recognized the action of person 2, watches the original link finds it useful to and put person 2 to the vandalism page. This allows person 2 to post a formal request to person 4 which is the admin of the site and this will escalate the conflict further.
This kind of interaction pattern sounds for the untrained beginner as an antipattern, but in reality it is the way a healthy wiki community is acting. They will produce at the end a group decision if the posted url from person 1 remains in the wiki or not. Wikis are supporting group oriented decision making.
Now let us describe a situation in which this feature is not needed. Suppose somebody wants to write content. Is there a need to ask a group if the content is the right one, if two much spelling mistakes are in the content and if the author is qualified in doing so? No there is no need. Asking the group is not needed for creating content. Each author can decide by it's own and instead of asking somebody he should press the submit button to upload the content as quickly as he can. The best tool for supporting this behavior is not a wiki but a blogging software like wordpress.
It is important to know the pros and cons of mediawiki and wordpress.
Mediawiki: supports group decision making, good for evulating content, not the right choice for creating content
Wordpress: supports individual content creation, doesn't support group oriented decision making, not the best choice for creating blog aggregators.

3 Wikinews

The WIkinews project https://en.wikinews.org/wiki/Main_Page have realized the idea of a wiki based frontpage to existing information as well. New articles in Wikinews are surprisingly short. They are written as a teaser and redirecting the reader to the reference section. In the references a link is provided to the original source which is hosted at CNBC, the verge, or the new york post. The idea behind Wikinews is, that the content was already created in an external blog and Wikinews has to aggregate the content into a unique website.
The teaser text before the URL is a bit longer than known from Reddit. It contains more than a single paragraph. So it is introducing the source well elaborate The interesting feature is, that each article is tagged within the media wikisyntax. It contains for example tags for the year 2009, the country india and the topic which is robotics.
Let us analyze the needed manpower for creating wikinews articles. It is very small. Because the longer article is written already, wikinews provides mostly the URL to the content and adds a small introduction. If the teaser text is only two sentence a complete Wikinews website can be driven by a single individual.
Does the project make sense? Yes, very much. Because the more complicated thing in Wikinews over writing the article itself is to coordinating the users who are adding new urls. What every wiki is very good in it is to allow different people at the same time to interact with each other. That means, 100 users can register in the wiki and they can argue against each other which new links should be added and which not. The interesting feature is, that the underlying fulltext (the blogpost) remains untouched. They are only debate about the question if the blog should linked or not. In theory a wiki based frontpage can be driven by a single admin without inviting other users. But this is not very funny, because no conflicts are there.

4 Building a social network website from scratch

Many people are fascinated by social network websites like Facebook, Twitter and Reddit and they want to understand how to clone such a website. If somebody is able to clone a product or a website he has understood the inner working. Cloning is some kind of reverse engineering with the aim to open the black box and analyze what is hidden inside.
The first step in building a social network from scratch is the technical side. What is needed here is a software like Elgg or Humhub which are open source projects with the aim to build a social network in the intranet. That means, the users are not dependent from large scale websites like Facebook but they are creating their own space hosted on the own server.
But is the Elgg software similar to a social network? After installing the program on a server the system is technically working fine but something is missing which are users and traffic. From an abstract perspective a social networking site allows to post comments similar to a forum website. But what exactly is the difference between a forum software like phpbb a blogging software like Wordpress and a social networking software like Elgg? From a technical perspective all of them are working similar. In the backend there is a SQL database and in the frontend is a PHP script which generates the HTML form shown on the screen. The user types in something and the information is stored in the database.
This allows to reduce a social network tool to it's core feature. There is no need to install a software like Elgg, but a simple python script which has access to a database can do the same job. It seems, that social networks are not based on a certain scripting technology, but it's about the user's behavior.
Suppose it's possible in an intranet to make the own rules how the users should act. What are the soft rules everybody has to follow to build a social network? I would guess this is the more important recipe. To moderate the users into a certain direction. The good news is, that in an intranet the technical admin is able to ban users if they are ignoring the rules. That means, the admin is in the comfortable position that his rules have to be fulfilled. The follow up question is which kind of rules are needed for a social network? If the admin doesn't know the project won't work successful.
I have researched the topic a bit. The common social rules which are equal to Twitter, Facebook and Reddit is, that these websites understand themself as a blog aggregator. This self-description allows to formulate the rules for the Reddit clone:
rule 1: it is forbidden to post longer content. That means, if somebody tries to upload a 1 MB pdf file or is posting a 100 kb textfile to the social network he gets banned.
rule 2. What the users are allowed to post are URLs, plus a short description whats behind the URL.
rule 3: the posted URL should fit to the section for example Artificial Intelligence, and the URL should be fresh. That means it is referencing to content created one month ago.
rule 4: if somebody posts URLs which are not fitting to the domain or the URL is outdated the user gets banned.
These simple 4 rules are able to replicate a social network in the intranet. If the admin checks if the rules are respected by the users, the newly created social network will look similar to the large websites (Facebook, Reddit, Twitter). Perhaps the intranet social network will not be exact the same, because the number of users is smaller and because the real reddit website is more attractive, but the general idea is the same.
Sometimes, social networks are described as social tagging websites in which the users can post links. This description is not sufficient. Because a social network doesn't has an internal structure, but what the users are doing in the network depends on the content located outside the network. Before somebody can post a URL he has to know the URL first. And before somebody finds a URL in the Internet, somebody else has to put content behind the URL. The better description of what social networks are about is a blog aggregator. It is some kind of overview websites which is monitoring the blogs which are updated recently.
Detail features like upvoting, downvoting and commenting links from other are additional feature which make the social network more interesting. Also the feature to preview the content in a smaller window and a strong focus on entertainment / funny subject will result into an improved social networking site.
What we can say for sure is, that a social network contains URL to external content which is posted by a group of people and sorted into domains like “fun, games, party pictures and so on”. IN case of Facebook the situation is not so strict, because the user can upload images directly to facebook, so it has the features of an image hosting website, while at twitter the postings are not organized in groups but around people who have written a message. What the smalles common definition is, that social networks are monitoring the blogosphere. It is some kind of realtime search engine created by humans. Usually, the amount of interaction on these websites is higher than in a normal forum. A forum collects information posted by the users. A forum is not monitoring external content.
In the academic community social networks are called an overview paper. An overview paper doesn't present new information about a topic, but it is reviewing exsting content which is already there. The typical overview paper contains of 300 references, but sometimes up to 1000 references. In contrast to a social network the overview paper can't commented in realtime. The shared similarity is, that social networks and overview papers are trying to stand on top of the community. They are in a higher position and make jokes and recommendations about existing information. In most cases, overview papers are written by experts of an domain who has read all the information and is able to give the context information which makes it easier for newbies to identify relevant information.
The interesting point is, that even in the age of search engines, overview papers remain important. They will answer the question which topic is interesting and why. In contrast, a search engine makes only sense if the user is an expert already and knows the keywords to enter.
Cloning Reddit
In the introduction it was mentioned, that before a website can be cloned it has to be understood. Suppose the idea is to create a Reddit like website in the own intranet which is generating a lot of traffic. The rules for doing so are:
1. the users should post URLs plus small descriptions
2. the links should be fresh and referencing to content created less than a month ago
3. the only allowed topic is “funny”, “jokes” and “computergames”
4. many users can login into the site, discuss and upvotes the postings of other
If all these rules are respected strictly the intranet website will develop into the same direction like the Reddit example. It will become a high traffic hub. The reason why is of the mixture of some features. At first, it is not very complicated to post a URL into a website. It can be done in under 2 minutes. This motivates the users to do it very often. Secondly, the topic is jokes, fun and computergames which is attractive to 100% of the users. and third the users can post without asking before so they are in control of the website.
All these features combined makes it likeley, that the website is perceived as useful and will attract new users to participate. The success can be increased if a bit marketing is done for the project, if the technical infrastructure runs stable and if some starting posts are available before the first users are invited to become active.
Now it is possible to describe some counter rules which will result into lower traffic on the social network. Counter rules are:
1. funny topics and jokes are forbidden, computergames too. Only hard scientific topics and programming URLs are welcome
2. the need for posting fresh url is reversed. Instead it is only allowed to post URLs older than 1 year.
Both rules combined make the website less attractive for the community. Perhaps some of the users will participate in the project too, because they have no need to talk about funny thinks or jokes. And perhaps they find the idea interesting that the URL should be not fresh. But in general the amount of users will become slower. The result will look different from reddit. It will become a low frequency social network which is missing something.