This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Abstract
The evolution from single user blogs, to aggregated blogs into social networks is an process which takes time. The main idea is to separate between content and the URL of the content. If URLs are posted to a public forum this results into a social network. Curated URL playlists are an independent way of analyzing existing content and a peer review is made possible.
Tags: wikinews, social network, Open Access, content aggregation
Contents
Reddit
1.1 Creating reddit posts fast and efficient
1.2 Understanding the idea behind reddit .
1.3 Understanding Reddit made easy . . .
1.4 Planet gnome as Reddit light . . . . .
Wikinews
2.1 How to clone Wikinews? . . . . . . . .
2.2 Analyzing the Wikinews recent changes
2.3 Why Wikinews is better than Facebook
2.4 Building a wiki based blog aggregator .
Misc
3.1 Definition of a social network . . . . . . . . . . . . .
3.2 Academic peer review with social networks . . . . .
3.3 Setting up a social network from scratch with mediawiki
3.4 Is Python the right choice for creating a dynamic
website? . . . . . . . . . . . . . . . . . . . . . . . .
3.5 reinforcing vs inhibiting cascade in the peer review
proccess . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Creating a link posting bot but why? . . . . . . . . .
3.7 Python script for creating fake news . . . . . . . . .
3.8 Best practice method for an intranet . . . . . . . . .
3.9 Evolution of bookmarking tools
1 Reddit
1.1 Creating reddit posts fast and efficient
Before a newbie should post something to a famous news aggregator website, it is a good idea to try out the idea in a smaller version. I'm not talking about the markdown syntax which shouldn't be a major concern for most authors, but the idea of posing links to external websites. A good starting point for creating a reddit stream of consciousness is in the own blog. The idea is not to post another text which explains how the world works, but give the overall blogosphere a voice.
In the early time of blogging, a certain style was used which can be seen as webring frontend. THe idea was, to collect all the URL to new content which was posted by the community in the last week. Before a URL can be put onto the list, two conditions have to be fulfilled, first it has to be fresh, and secondly it most fit to the community. A typical template for creating such a news aggregator post is:
1. URL, title, description, date.
2. URL, title, description, date.
3. URL, title, description, date.
Such a syndication feed has the aim to mirror the overall community. The peer should be found on the list. The value is generated for a the audience, which gets a collected impression of what is going on a certain domain, but also for the content contributers who gets lots of links and traffic from the aggregated RSS feed.
After this formal setting was explained we can discuss about the details. Which URLs should be put on the list? Nobody knows, the list will look different on a subjective basis. It has to do with priorities. If a list is posted into the own blog, nobody else will modify or comment the selection. If the same list is posted to reddit, a feedback mechanism is started. Which means that somebody else will argue, that the selection is right or wrong. This is the reason why posting something to reddit is more complicated. It is some kind of meta-meta blog list.
Let us go a step backward and assume, that the list is not posted to reddit but only to the own blog. This makes it easier to explain what the principle is. In the own blog, the one and only admin is you. That means, there are no other users, who are argue against the list itself. Instead the judgement is done only once. The judgement is always about the community. If the topic is called Artificial Intelligence, then the newsfeed has to contain all the relevant content from this domain. But what is relevant? Google knows it. Google is able to return all the updates from the last week. The problem with google is, that the amount of URLs is too high. So the question is which of the posted AI content from the last week is interesting and why. This can only be answered if someone is familiar with two points of view. The first one is Artificial Intelligence itself and the second one is the community, which means the blogs which are posting something about the content.
The good news is, that the amount of content sources about AI is limited. We ignore all non-english speaking content, because it is not relevant for a worldwide audience. US speaking AI programmers can't read german nor japanese. So they have no need for clicking on weblinks to such content. What we have to answer are two things: first, which kind of demand is there (=what likes the reader read) and which kind of supply is available, which means what content was posted. A news aggregator has to combine both needs into a single document.
Let us explain why doing the middle function is complicated. The demand for AI Content is driven by non-experts. Most of the 7 billion people in world are interested in robotics and AI but never have read a science paper from the subject. At the same time, most AI researcher have an academic background which means, they are posting papers at Arxiv but explain the working of their theories too complicated. Is it possible to bring both sides together? Probability not. I'm very pessimistic that an AI news aggregator will work in reality.
Small tutorial
The basic steps for a creating a meta-blog which is observing a certain topic is not very complicated. We have to enter the topic into the Google search engine, restrict the results to the last week and go through the result page with the aim to identify major developments. The result is written down and that is the aggregated newsfeed.
The reason why this is not done more frequently is because it takes time to create such content. It is more easier to create a normal blog over a blog aggregation. And sometimes it is argued, that there no need for creating such a list of URLs, because anybody else can use Google in a the same way. But most existing watchblogs and RSS combination sites are perceived as usefull, so it seems there is a demand for this service.
Banning users
In most Reddit tutorials it is explained what banning means and how to avoid it. Reading such advice is very funny, because if a user gets banned, he is no longer allowed to create a watchblog at reddit. Let us go a step backward. Creating a normal blogpost or a normal Q&A question is easy. For example, somebody has the idea to post something about his latest python sourcecode excersise and a blog is the best way in uploading the text. In contrast, a news aggregation website is harder to create. It follows the above described steps in which a user is searching at google for the latest useful content and the resulting list of URLs is put either on the own homepage or posted to reddit.
So what will happen if such an attempt is blocked or banned by the reddit moderators? Not that much. It means only, that the user has created a list of wrong URLs or that the idea of creating an aggreated meta watchblog is not desired at reddit ... Very funny. In both cases, the user can upload his RSS newsfeed to it's own page. What i want to explain is, that banning a watchblog doesn't make much sense. Because the amount of people who are creating such aggregated content is small. In most cases the problem is not, that to much syndication feeds of hand-selected URLs are available but too little.
Let us explain the situation from the opposite perspective. Suppose, there are 100 bloggers in the internet who have created all their own “yet another robotics” blog. They are creating the content for it's own, what the community doesn't have is some kind of meta-blog which is observing all the content and selects the most useful posts. What will happen if one of the 100 bloggers will do this job, will the other bloggers argue, that he is not allowed in doing so, or that they doesn't want to be referenced in the meta-blog? The opposite is true. They will like the idea very much and they hope that they don't have to take this part by themself. Because it is easier to create the own content than aggregating existing content from others.
The idea of banning a watchblog is some kind of inside joke. The story doesn't make much sense. Because banning means, that no meta-blog / RSS Syndication is available which will reduce the visibility of the content and makes it harder for the community to connect to each other.
The more problematic behavior is, if a new user get's a plattform at reddit. Because this means, that the community trust him and is asking for a list of well sorted content from the last week which includes the URL to external websites.
1.2 Understanding the idea behind reddit
Before we can describe what reddit is, we have to understand first the problem which is addressed by reddit. In the internet, are two kinds of websites available. The first one is a classical newspaper for example the new york times which is publishing dozens of postings each day created by a group of authors. The second type of websites are blogs which are providing one or less postings a day. Reddit brings both type of websites together. The concept is called content aggregation and is equal to form a virtual community. But let us go into the details.
A normal blog in the internet is not able to compete with a newspaper. A single blog doesn't have the same manpower like a large scale newswebsite. But this problem can be solved with a trick. What is, if the content of hundred of blogs are combined to a large mega-rss-feed? Right, this would result into a pseudo newspaper created by different blogs. That means, all the blogs in the internet are providing the content but the information is distributed over many websites. So the question is how to mix the existing content into a handy GUI, and here comes reddit into the game. The basic idea is that under a topic /r/linux all the Linux blogs out there are combined. If a blogger has posted something about Linux he has to announce it at reddit too.
In reality, the principle are a bit different because the official reddit rules are not allow self-linking. That means, if a blogger likes to link to his own blog the reddit community becomes nervous. But this kind of rule is only a detail problem, because anybody else can post the link as well. The inner working remains the same, that under a single website the content of distributed websites is visible. It is important to separate between a page like reddit and a handcrafted URL directory like Yahoo. The idea of Yahoo and dmoz was to build an hierarchical index. That means, in year 1 all the linux blogs are submitted to the index and then they are staying in the list. In contrast the idea of reddit is to provide a newfeed. Which means, that individual posts are redirected into the reddit groups.
Let us summarize the information into a do's and don't tutorials. What is not allowed on reddit is, that somebody posts at /r/linux a question like “Hello, I need help with the installation of Ubuntu, i've typed in a commandline but it doesn't worked”. The problem with this posting is, that reddit is different from an online-forum. What is also not allowed at reddit is, if somebody posts “Here is my blog, please subscribe”, because this is threated as offensive.
Now we can answer which sort of information is welcome at reddit. If somebody posts a link to an online forum in which recently a topic was discussed, or if somebody posts a link to an interesting blog post. In general we can say, that Reddit was invented to aggregate weblogs in the internet from the last week. In the best case, the postings are added by personal comments and are not from the original bloggers but independent followers.
Blogs vs. forums
Well known websites in the internet are blogs and forums. The idea behind a blog is, that anybody can post a text and images he likes. There is no need to submit an article to a newspaper, instead the content can be uploaded into the own blog. It get indexed by google and can be found worldwide. The idea of a forum is quite different. Here is the concept, that not a single person but a group of persons are interacting with each other.
But what stands above all the groups and all the blogs? Right, a fulltext search engine called Google. Google is there place in which beginners type in a keyword and gets redirected to the websites in the internet. It's an interesting fact that apart from the Google searchengine other kind of information filtering mechanism are available like Reddit, Facebook and RSS Readers. It seems, that fulltext search alone doesn't solve the problem of content filtering. To understand the concept of a gatekeeping website we have to analyze the website of newspaper. A newspaper provides fulltext content but it provides also a list of articles. What reddit like websites are doing is to split up a newspaper website into the fulltext and the article feed. Reddit provides only the article feed which has to be filled by external blogs. A reddit feed is edited by a group of people. And this makes the concept so interesting. Because this group is working with certain principle in mind. The game is not “how to create an article about a subject”, the game is called “how to create a URL list of articles about a topic”.
Tutorial
Existing tutorials about reddit are trying to analyze the given rules and the current discussion on reddit itself to describe which kind of behavior is right or wrong. The better way for getting in touch with reddit is to ask what the purpose of a news aggregation website is. We can ignore the official and the unofficial reddit rules and describe from an abstract perspective what the term content syndication means.
A mega RSS feed has the idea to support two sorts of users. At first, the readers of the feed who have a need for the latest headlines of the internet but also the content providers should be supported because they have a need to put their content into world. Let us describe the principle first from a technical perspective. The well known RSS XML format provides date, title, abstract and URL of a blog or of a website in general. A mega-rss feed combines single feeds into a larger one. A formal property of a mega feed is, that it provides information from a certain timespan (for example the last week) and from a certain topic (for example about robotics).
The first attempt for a news aggregation is to collect all the Headlines of the internet from the last week about robotics in a single RSS stream. This attempt goes into the right direction but has the disadvantage that the amount of informaiton is too much. So the next question is how to reduce the number of entries. And voila, this is the purpose of reddit. In reddit only a small amount of the total informaiton is presented and additional, the other users can upvote and downvote to modify the selection. Basically, a reddit subgroup provides the same content what is visible if we enter “robotics” in the Google News webpage, with the exception that no double entries are visible and the content has more quality.
The question is not if reddit is a great website, the question is, if a manual created RSS aggregator makes sense.
1.3 Understanding Reddit made easy
Reddit can seen as a phenomena. Most marketing experts are aware that the website exists, but they are unsure how to utilize the website the right way. A common misconception is to see Reddit as some kind of forum / social media website. A forum is something which can be moderated and which is a about a certain topic. Stackoverflow for example is a forum. The user is posting something and the community can answer it. Reddit is working slightly different. Another misconception is to describe Reddit as a social bookmarking service which allows user to insert new URL to websites they like. Bibsonomy https://www.bibsonomy.org/ is a bookmarking website but Reddit is again something odd.
The best description what the so called social news aggregator is about can't be found in the subreddits nor in the help-section but in the blogosphere. If somebody has understood what the blogosphere is, he gets an impression what reddit is about. The blogosphere is something which is only indirect connected to Reddit, it is the content outside of the website. The blogosphere and Reddit are dependend from each other. If we shut down all the blogs in the Internet, Reddit is closed too. So what is a blog?
The typical blog was created by an individual, is updated not very often and contains amateur content. The total number of blogs in the world can only be estimated. The latest statistics explains, that each day around 3 million blogposts are created worldwide. That is a huge amount of content and it is growing. The blog community is only a small problem. They didn't find readers. The problem with most blogs is, that they are too small to reach a larger audience. In contrast to major newswebsites and portals, the typical amateur blog has only a few or sometimes zero subscribers. Somebody would argue, that all the blogs are indexed by Google and that Google will direct the traffic to the blogs. Technically this is right, the problem is that in reality this principle is not reliable enough.
How to solve the low traffic problem of amateur blogs? The answer is to create blogging communities. That are meta blogs on top of the existing content which are selecting and aggregating information. For example, there are 100 individual blogs about Lego Mindstorms available in the Internet. And now, somebody creates the 101. blog but this time, he doesn't post yet-another tutorial for creating a line following robot, but he creates a meta-blog in which he is monitoring what the other blogs are doing. A typical post has the title “Blog ABC has posted an NXC tutorial”, or “Blog DEF contributed a video from the last mindstorm competition”.
Suppose, a newbie likes to read some news about the Mindstorms community, which blog he would like to subscribe? Blog ABC, DEF or the meta-blog which is aggregating all the information? Right, he prefers the meta-blog. If the meta-blog is able to analyze the content the right way he was able to connect all the subblogs into a virtual community. All the blogs will put a link to the meta-blog and this will help the bloggers to see them as part of a larger group. They can stay within their own blog and the same time they get readers from the meta-blog.
This kind of hypothetical situation results into the Reddit page. Reddit is a meta-blog. It connects all the millions of individual blogs in the world. With this background knowledge in mind it is very easy to describe how the reddit community is working and how not. The first important point is, that community building can be done outside of Reddit. If somebody likes to connect 100 existing Lego Mindstorms blogs into a larger community he can set up a normal meta-blog / linkblog / blog-aggregator website. The only thing what is important is, that the content is handcrafted, because this results into a higher quality. The chance that a meta-blog is subscribed by the readers is much higher than a normal blog. The disadvantage is, that no content can be posted in a metablog, because it's role is to aggregate information which are already there.
The second step is to mirror the own meta-blog to Reddit. This sounds complicated but in reality it means only, that the postings can be upvoted and downvoted by the public. The meta-blog is no longer in control of a single admin, but it is written by everybody. The shared similarity between a meta-blog and reddit is, that both websites are aggregating existing content. They are forming a virtual community which is feed by the underlying blogosphere. To understand Reddit we have to take a look at the referenced blogs outside of Reddit. It is important to know, what the individual 100 Lego Mindstorms blogs are posting each day. If somebody has understand it, he can manipulate the reddit community easily.
Let us go a step backward and assume that Reddit doesn't exist. The task is to monitor the blogosphere. Not the entire one, but only the blogs about Lego Mindstorms. Which tools are available to do so? The first one, that somebody should know the topic itself, which is educational robotics. The second one is, that he has to know which important blogs are available, third he needs a fulltext search engine like Google and perhaps additional tool for analyzing the content in detail. Then he would create a report about the blogosphere acitivity and this report is posted to the internet on a daily basis. The result is called a meta-blog or a monitoring blog. If it's written well, it will attract a larger audience outside the Lego Mindstorms community because it allows the newbies to get an overview over the topic.
The disadvantage is, that creating such a monitoring blog is a demanding task. It can automated only in parts and this is the reason why the amount of such blogs is low. For mainstream topics like politics and music, professional gatekeepers with with TV and newspaper background are doing so routinely. They are using lots of manpower for example 100 persons at the same time, who are monitoring all the content in the world and aggregate the information into a newspaper. But for special interests topics like Lego Mindstorms, no dedicated News-agency is available.
Overview article
In academic writing there are two different sorts of articles available. The first one is a normal paper, in which somebody explains how a certain topic is working. For example the researcher has figured out how to realize a pathplanning algorithm. The second sort of manuscript doesn't provide new content but it's an overview paper which is analyzing what the community in the last 10 years has published in the domain of pathplanning articles. Overview article containing lots of references and they are compared to each other with the aim to draw the general picture. Overview articles are more complicated to write. In most cases, they aren't created by beginners and they are ranked higher in the result list.
A typical example is, that in a period of 10 years around 120 papers about pathplanning algorithm were published, and 2 overview articles which are analyzing these 120 papers. The two overview papers have much in common with reddit.
1.4 Planet gnome as Reddit light
Reddit is one of the most frequented websites in the world which makes it hard to explain the inner working. A more smaller and handy mini version is available which has a different look and feel but it is operating with the same idea. It is called Planet gnome https://wiki.gnome.org/PlanetGnome and sees themself as a meta-blog about the gnome community. The gnome community is not very large, it contains of enthusiasts from the Open Source movement. To understand what planet gnome is we have to first describe, that the communities organizes themself around blogs. There are some Gnome related blogs in the internet. And planet gnome aggregates these blogs. It is done with human moderators who are reading the blogs and decide which of the newly posted content is feed into the planet gnome RSS feed.
The result is amazing. The reader doesn't need to read the individual gnome blogs, but can follow only the planet gnome website. If he would like to read the entire article he clicks on the link and gets redirected to the original content. The concept is similar to what Google news is about: it's a list of headlines which are combined under a unique environment.
Sure, not everybody is interested in Gnome-related information. But how this community aggregates information into a meta blog can be seen as a best practice method which can be transfered to other domains as well. In the guidelines it is described who exactly an individual blog can become a member of planet gnome. He has to contact the moderators and provide the URL of the blog, his name, a link to the feed and the blog has to fulfill the community rules.
In German language the same concept is used for the https://planet.ubuntuusers.de/ website, which is also an aggregator for single blogs. The idea is, that around 50-100 individual blogs are providing new content and the higher instance called planet ubuntuusers creates the headlines for the content. That means, the moderation and the content creation is distributed. This makes it easy to cancel an existing blog from the list and to add new source of information.
The planet gnome website provides an about section. Quote:
“Planet GNOME plays an important role inside the GNOME community. It is so successful that very few people could imagine GNOME without the Planet now.”
An academic paper about planet gnome is also available: .
quote: “To select the communities based on the above criteria, we analyzed Planet a popular blog aggregator. Planet is used by 43 open source communities to collect blog posts of their members in a single place.” page 4 Dennis Pagano and Walid Maalej. How do open source communities blog? Empirical Software Engineering, 18(6):1090–1124, 2013.
The reason why planet gnome and planet eclipse have this name is because of the software which is called Planet
It is an RSS aggregator written in Python. A list of more blog aggregators with the planet software is available at
https://news.ycombinator.com/item?id=4929490
2 Wikinews
2.1 How to clone Wikinews?
The most mature project in the Internet with a wiki for curating news article is Wikinews. The project was described in an earlier blogposts in depth and this time i would like to explain how to clone the idea. The idea is to start a wikinews website from scratch.
1. mediawiki as a wiki engine
2. RSS feedparser: a python library is able to read an XML feed
3. autoposter: the RSS inputfeed is converted into markdown syntax and is posted into the social network
4. pre-moderation: after pressing the submit button the posting of a user is put into an incoming queue, all the messages have to manually approve
These four elements are enough to build a fully working social network. It gets filled automatically with new URLs from the autoposter. Additionally external users can create an account and submit a post. The post is queued in a filter and the admin of the wiki has to manually approve each single posting. Then the human posting is visible in the normal wikifrontend.
Such a social network isn't asking the user for creating content. All what the user has to do is to post a URL, and additionally he can a add a short description. And he can comment postings from other users. This allows a continuously communication flow similar to a Facebook group.
All actions in the wiki are logged by the mediawiki version history. The user can track that his posting is pushed to the queue. And he sees, if the posting was published. If the admin likes to ban a user, he can do so with the mediawiki toolset.
What will happen for sure in this wiki is, that some of the users are writing autoposter like tools. Either because they have direct access to the API or because they are familiar with the AutoIt programming language which can scrap the information of browser and autofill the forms. An autoposter is a for-do-loop which is posting 10 messages a day into a social network.
It is important to know, that autoposter in context of social networks are not the exception but widespread used. If somebody is famiilar with Facebook groups he will like the idea as well to do the same against a social wiki. The described setup from the beginning doesn't provents autoposting. A newly posted message will put to the incoming queue and is released manually. If the post was generated by a human or a bot, can't be verified. Especially if the posting was submitted by an unknown IP adress and contains only of a URL it can be submitted by a or human or bot. It is not possible to ban bots from sending something to the website.
2.2 Analyzing the Wikinews recent changes
The Wikinews project is working on top of the mediawiki engine. The SQL dump is available online but it contains of many seperated table which are referencing to each other. REading through the SQL dump is a bit hard, the better idea is to use the tools which are integrated in the web GUI. I've found two important buttons so far: Recent changes and logs. The recent changes doesn't show all actions. They can be made visible by open the logs menu, in which additional actions are shown.
The good news is that the events can be ordered in a chronological way. That means if a user creates a page, the exact timecode is shown in the log file. This makes it more easier to trace the different actions back and search for regular patterns. Perhaps a simple example would help:
Today, a user under a IP has created early in the morning a new page. The action is shown in the logfile. 30 minutes later on the same day, and admin has blocked the user. with the comment advertisment/spam. One minute later the admin also deleted the new created page of the ip adress.
According to the log file, the total amount of such actions and reactions is small. On this morning it was the only creation and deletion workflow. If we are scolling in the timeline we will notice similar patterns. From time to time a new user creates a page and around 60 minutes later the page gets deleted and the user is blocked by the admin. This seems to be a normal pattern in the timeline. Unfurtunatly it is not possible to open the deleted page and see what the user has put in the article. Somewhere in the system this information is stored, but it can be done from the event log itself. The only way to get more details is to live monitor the process before a newly created page was deleted. That means, we have to wait until a page was created and before the admin can press the delete button we have to open the page in under 30 minutes. The problem is that the action “create a new page” is rare on the Wikinews site. The amount of users who are doing so is tiny.
Somebody may argue why this detail analysis of the Wikinews event log make sense. Isn't it enough to investigate the normal wikipedia? No it is not. The admin behavior in the normal behavior is different. It is true, that in both cases the mediawiki is used, but the reason why and the moment when an admin press the delete button is working with a different pattern.
2.3 Why Wikinews is better than Facebook
Wikinews allows the user to promote their own political campaign. It is the perfect hub for advertising ideas and move a hidden agenda into the mainstream audience. This is realized by advanced rendering engines which are supporting embedded videos, audios, hypertext documents and external links to more information. The wikinews admins are marketing experts and will explain to the newbies how to use the engine the right way, so that the overall traffic will increase.
What Wikinews doesn't provide is a realistic description of the world. If somebody is interested in an neutral point he is maybe on the wrong website. The reason is, that Wikinews is frequented by too much of advertaisment which will prevent that a single point of view will dominate the debate. It is more a marketplace of jokes, cat photos and music videos which is attractive to ordinary phd students.
The perhaps most attractive feature of Wikinews is, that it can explain much better what social networking is. In contrast to a common misconception it is not about connecting people but about content aggregation. The means the user page of a Wikinews user is empty. He doesn't provide information about his age, cultural background or interests. It doesn't matter for the working of the social network. The only thing what rules are the posted URL. If these URL are the right one, the user gets upvotes. In contrast to Facebook, WIkinews is very open for bots. That are software programs who are curating playlists according to algorithms. They are helping the user to communicate and to find new websites which are amazing.
A formal procedure to register a Wikinews bot is not needed. It can be created inside WIkinews or by external frameworks. A best practice method in creating a Wikinews bot is the AutoIt software which is generating simulated keypress and mouse actions with in the webbrowser. Addititionally the normal textual API is also available but an AutoIt bot is the more elaborated tool.
2.4 Building a wiki based blog aggregator
The Wikinews project is super-advanced project, which shows in which direction social media has to go. Wikinews has only one small mistake. It is biased. What does that mean? Before a wiki-realized can be realized the admin have to invent some rules which make sure, that everything runs smoothly with the project. In case of Wikinews the basic rule is, that a newly created article needs at least two sources and both sources must be indexed in Google news.
Sure, a random stranger can create technically a wikinews article and inserts two blogposts: one from medium and the second from wordpress. But the wikinews admin will delete such posting quickly. That means, Wikinews trust Google news, but Wikinews is sceptical to the blogosphere.
There is nothing wrong with biased point of view. The wikinews project is able to define what is spam and what is not. But the resulting question is, if a different news hub is realistic which is working with a different bias.
Suppose the idea is to create a complement who does the project will look like? It would be nearly the same like Wikinews, only with the exception that newly created articles can contains 2 sources which are blogposts. The self-understanding of the media in Google news is, that they are able to describe the world. Google news doesn't indexes blogs and Wikinews doesn't accept blogs. To understand why does it matter it is important to analyze the workflow until an article gets published in a newspaper.
Is a normal blogger allowed to create a report and send this to the Guardian news paper or to any other media company listed in Google news? No they don't. The sources listed in Google news are closed systems. They are not individual blogs but they are business companies who want's to earn money with the content. Earning money is from the point of the companies a great idea, but on the other side there are the readers who are not interested in spending money for news article. The readers didn't want to pay for listening to radio, watching television or get access to newspapers. What the reader prefers is open access content. Which means, that the readers asks for a service which isn't provided by CNN, the guardian and other.
Let us go a step backward. Before a news hub can publish a story at least two sources are needed. And before a ressource can be linked, the ressource must be created. The cheapest way of creating content are weblogs. The blogosphere is free to read and free to write. The content in the blogosphere can be aggregated in a news hub. Technically a news hub is a social network and can be realized with the mediawiki system. The result is equal to Wikinews, but only with a different bias. This new bias prefers blogs under a creative commons license and doesn't references paid ressources which are protected by paywalls.
Perhaps one word about creative commons. The wikinews project itself has a creative commons. The content there is free to the world. But, the URLs at the end of the news article are directing the user to professional content taken from Google news. Content which is hosted at the Guardian newspaper or at CNET is not provided under a creative commons license.
The better idea is to reference only to sources which are creative commons too. In the current wikinews project this is not possible. Like i mentioned before if somebody creates a wikinews article and is referencing to two blogs, the article gets deleted by the admin. It makes no sense to argue with the admin or discuss the issue inside the WIkinews project, because this rule is fixed. The user has only the option to follow the rule or he can't participate in the Wikinews project at all. I don't think that Wikinews is broken, but there is a need for something which is more open to creative commons sources from the internet.
3 Misc
3.1 Definition of a social network
Social networks can be defined by it's formal structure. A small scale social network which is restricted to a single website looks like:
2018-03-04 [internalURL] comment
2018-03-04 [internalURL]
2018-03-05 [internalURL] comment
2018-03-07 [internalURL] comment
While a large scale social network which is open to the entire Internet looks like:
2018-03-04 [URL]
2018-03-04 [URL]
2018-03-04 [URL] comment
2018-03-04 [URL] comment
This is the most basic structure available which is used for all social networks. The basic structure can be extended by groups which makes the table more readable and with upvotes and reposting ability. For reason to explain things more clearly, only the basic structure is used which contains of “date, URL, comment”.
The most obvious property of social networks is, that not content is presented but it looks like a curated playlist. The URL is referencing to content stored elsewhere. The social network is the place in which this content is evaluated and made accessible. Social networks have much in common with a search engine. A search engine is also able to index existing content. The difference is, that a search engine is created by a machine and the only thing what a search engine can provide is a complete searchable index of all the content.
In contrast, a social network is curated by humans and sometimes by autoposting tools and it can look very different. What is important to know is, that social networks are not the same like a forum. A forum can work without posting any URL. In some forums it is forbidden to post URL, and the quality of the forum is high and very high. In social networks it makes no sense to forbid posting of URL because this is the core feature.
From a perspective of server software, social networks can be realized with a variate of programs. It is possible to build them with wikis, inside a normal forum software, in dedicated social network software or with the help of email messages. The most interesting part of social networks is, that the amount of work to posting a URL plus a short comment is low. This makes social networks interesting for a large amount of people. Copy&paste a URL and add a sentence can be done in under 10 seconds. As a consequence the amount of traffic in social networks is high and very high. That means, the amount of daily new posts is high and the amount of people how are doing so is high. If this posting activity is supported by autoposting software the physical amount of load on the server will gets the bottleneck.
Automatic peer review
Most people are unsure what peer review is and they have no idea what social networks are. Sure, there is somewhere a help section but the idea can be explained by an automatic teacher much better. The easiest way in doing so is to write a small computer which posts automatically URL into a social network. If the bot is not to active he gets tolerated by the admins.
Such a bot is generating a practical example what a useful post is. He shows that a posting in the form URL, comment makes sense. What the human users are asked for is to emulate the behavior of an autoposting bot. In most cases they can imitate the behavior much better. Let me give an example.
An autposter bot, is creating one post a day in an AI group. He posts links of the latest paper in the Arxiv directory. The bot itself is not very advanced but it is a good starting point. What the human can do is to work together with the bot. They can comment an existing post. This will help other people in the forum, or they can imitate the bahavior and post some links to papers not given in the Arxiv directory but which are useful too.
The autoposter works as a icebreaker and the humans can swim behind him and do the same. If the autoposter bot is more advanced the human will become more advanced as well. Another advantage of social networks autoposter bots is the manual around the software. Suppose the bot itself is deactivated but the manual who the inner working is remains available. Reading such manuals is very interesting because it explains explicit what a social network group is about. Let me give an example.
In most tutorials about Autoposting bots it is explained that the posting frequency should be low. A good value is to post not more than 2 messages a day. This hint makes sense for bots but for humans as well. That means, if a human is trying to post 30 messages a day the danger is high that he gets banned by the admin. Because this frequency sounds not natural. In contrast, if the human imitates the autoposting bot, everything is working fine.
bot1 is posting URLs from arxiv (one message a day). human1 is posting manual URLs from sciencedirect and human2 is imitating human1 but posts URLs from cnet. The result is a healthy looking social network group which provides an ideal environment for peer reviewing all the content. A stranger can write a comment because he doesn't understands why he should pay for the Elsevier paper while nearly the same content is available from arxiv. A second human reads the comment, has the same opinion and upvotes the last posting of the arxiv posting bot. and so on.
Now we are observing an interesting case. What will happen if the admin of the social network group is monitoring the situation and comes to the conclusion, that bot1 is not a human but a bot. He decides to delete his post. This deletes also the upvote of user2. As a result the user2 get's angry. That means, deactivating the bot1 is not the best idea the admin can have. Or to explain the situation the other way around. The best idea to manipulate a social network is to deactivate some of the bots.
3.2 Academic peer review with social networks
In the Open Science movement there is a big problem left open called peer review. Peer review is something which is done after a paper was created. It is a judgment about the quality of a pdf file. A problem which is solved in the community is how to put a paper online. It is possible to upload an academic paper to a variety of hosting websites, put it into the own blog or copy it to the github repository. From a technical point of view the pdf file is available under a worldclass URL and the content can be downloaded worldwide. If the paper has a creative commons license the distribution is made more easier and fulfills the open access guidelines great.
The only problem is how to get traffic to the file, how to get readers for the content. Academic papers have the general problem that nobody else apart the author is motivated to take a look into it. Especially if it's a paper not peer reviewed before. The reason is, that if a paper wasn't peer reviewed it won't be visible at Google Scholar, and if the paper isn't searchable in Google scholar other authors can't find it and they won't cite it.
Overcoming this hen egg problem is easy and it is called social networks plus autoposting software. The idea is to post the URL to the paper to a social network for example Google+ and then wait what happens next. Somebody who is not familiar will ask back, that nothing will happen. Because the Google+ group is empty, has no readers and nobody will click on the link. That is technically not correct. A not often menioned feature of social networks is, that they are heavily populated by posting bots and by moderation bots. These bots are producing the standard noise in the group.
Let me give an example. Bot1 is used by the original author to post the URL of his papers to a Google+ group. Bot2 was was installed by the admin of the group to determine if a spam post was send. Bot2 will go through every post, opens the URL, follows the link and check if the content is spam or not. If it's spam then the post gets flagged for further investigation. Bot3 was programmed by a third party user who likes to generate some traffic in the group. Bot3 is selecting a random paper and press the upvote button.
What i want to explain is, that if somebody posts a URL to a social network he will activate a cascade of preprogrammed bots who are increasing the traffic. On top of the bot traffic the human generated traffic will follow, for example the group admin has to check the flagged paper by himself and this is counted as new page view. That means, the overall system has a tendency to do exactly what the purpose is. To make a URL more visible in the internet and to allow a peer review process.
If we are taking a look into existing social network we will find that not all papers gets commented but some of them. It is hard to predict how much comments and how much traffic a paper gets, but it is sure, that a social network is the place in which academic papers gets read and gets a peer review.
Cynical experts who are familiar with the do's and dont's will anticipate what comes next. Right, the fully bot controlled academic peer review. Sci-gen (Greetings to Jeremy Stribling) generates the pdf file and uploads it to a blog. A google+ autoposter copies the URL to a group. The Google+ auto moderator bot will open the paper to check if it's valid. A random stranger in the group finds the post also relevant and will press the share button. At the end the sci-gen paper has become 10 comments, 2 likes and the top position in the Wikinews headline of the day.
In the internet I've found a reddit post from the past
In which the original Jeremey STribling has posted a link to his famous “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy” paper. The paper was given by the URL to the pdf, and the assumption is high, that lot of people have clicked onto that link. To be fair, in the post it was mentioned the scigen project, so i would guess the post was an experiment how to increase the traffic of the well known rooter paper. And it seems, that the users in the social network liked the post very much. The upvote counter is at 191 points.
What Jeremy Stribling and his co-authors didn't have made is to post their paper to Wikinews. This kind of experiment would be a bit more advanced because Wikinews is powerful enough to make the Reddit hub obsolete. The problem with Reddit is, that on the right side in the sidebar a large advertaisment is visible and that the unverlying rendering engine is not available as open source.
3.3 Setting up a social network from scratch with mediawiki
Before i'd like to explain who to moderate a social network a more common example of using a wiki is explained. This kind of wiki is called content wiki because the idea is, that the users are creating textpages. Wikipedia is good example for a common wiki, but it is possible to setup a wikipedia clone from scratch.
The good news is, that such a system is highly predictable. The amount of traffic on such a website will become zero. The reason is, that a fresh installed mediawiki doesn't contain any information. If a random stranger from the internet has discovered the page recently he will recognize fast, the he or she has to create a longer artikel (50 kb well formatted text) to fill the wiki with information. This kind of task takes many hours and nobody will do so. That means, even if the wiki is for free and anybody can create a user account it is for sure, that the total amount of articles after one year remains at 0.
Even the large Wikipedia project which is a success story has problem to find volunteers who want to write articles or improve existing content. The reason is that the Wikipedia internal conflict escalation is dreaded. The only thing what the world likes is to read content which is already there. The number of readers of WIkipedia is approximately 7 billion people, while the number of authors is below 20k. In a fresh mediawiki the situation would become the same. The only author will be the admin who uploads new articles and with a bit luck he finds some readers, but he will never find new authors.
The description of content wiki was only given as an introduction and now comes the more interesting part which is a about a social wiki. A social wiki has a different guideline. The idea is not that content is created, but the user is asked to submit a URL and maybe a comment to the submitted URLs of other. Because the task of posting an URL to a self-selected website makes a lot of fun, the prediction is that after a short delay, some users will test the wiki out and become a user. They will create a fake account and post some links to the wiki because this increases their backlink score. Other users will write an autoposter for the wiki to fill the category not with a single backlink but with a list of 100 of them to try out under which condition the admin will become angry.
In contrast to a content wiki which was described in the introduction, the users won't stay away from the wiki but they will try to overtake the system and present their own marketing information in the system. The prediction is that such a system becomes chaotic and will generate a lot of traffic. So the interesting question is how to moderate a social wiki?
Let us go a step back to a content wiki. If the amount of users in a system is low and the admin is the only registered user, the spam is protected against all sort of spam. If no stranger is interested to create an account he can't post anything, that means the wiki will operate like a blog. If the admin posts something it is visible, but nobody else will do so because wiki are not attractive for the audience. If no postings are in the system, the moderation task is simple. That means, the admin won't get the problem that somebody is flooding the system and he didn't has to flag edits as spam.
In a social wiki the situation is the opposite. A social wiki is perceived as an attractive hub, which means that a lot user are creating an account and they will post URLs as much as they can. What can the admin do against this behavior? If he defines the wiki as a social wiki and asks the users to post URLs, then he can't say that this kind of posting is spam. If the guideline says:
Everybody can create an account, every user can post a URL, commenting existing URLs is desired.
Then this guidelines has to be followed by the admin too. To make the things more easier the short and quick answer of moderating a social wiki is to use a combination of pre-moderation and auto-moderation. This allows to handle URLs shortposts.
Pre-moderation means, that incoming messages are not put online but cached in a queue. The admin goes once a day through the queue and marks desired posts as valid. The the URL post is put online. If the admin doesn't monitor the queue, no posting gets published and the wiki remains cleaned up. The user can login, and the can submit URLs, but it is not shown on the website.
The other technique is technically more advanced. The term autoposting is well known from the Facebook world. Automoderation means the opposite. Automated moderation bots are checking incoming posts with a rule and decides by it's own if the post is spam, gets an upvote or should be answered. The best practice method is to install in the first step a pre-moderation system in which the admin has to release manual the postings, and in step 2 an automoderation bot is used to simplify this procedure and release normal posts automatically.
All what the admin has to do is to monitor his own auto-moderation bot and if the bot makes a mistake it is put offline and all the posts are blocked by the queue. With such a pipeline a social wiki is well protected against spam messages.
Botwar
Suppoe a social wiki is running great. The average user takes advantage of an autoposting tool which posts ones a day a URL from the list to the group. And at the same time, the admin is using an automoderation tool which is trying to identify spam. The moderation bot thinks, that the URL post is valid so all the messages can pass.
At a nice day in the summer, a new human user creates an account and posts his first message. He didn't post a URL but a longer text, because he thinks this is the correct behavior. The moderation who was trained to detect spam doesn't understand the fulltext and flags the posting as spam. That means the Autoposter bot can drop their URL to the social network, but the human is not allowed to post a well written text.
Is this the future? Is this already happen in the social networks? We don't know. But one thing is sure, humans are not the driving force behind all the traffic in the internet. In most cases the bots are alone.
3.4 Is Python the right choice for creating a dynamic website?
A webframework is at foremost a programming task, similar to creating a game or an office application. It contains of two steps, first the prototype is written and secondly, the productive code is created. Python is the best language for writing a prototype. This is especially true for a web framework. Compared to other languages like C++, PHP or Java, Python is less restrictive in case of syntax and provides more high level commands.
After the Python code was written the web framework can be tested in the intranet. The user has to check the basic features like login, type in text and update the sql backend. If the prototype passed the minimum standard it can be converted into C++. Why C++? Because C++ is the worldbest language for writing a production ready program. It compiles to fast machine language, works for all plattforms and is extremely fast for multithreading tasks. Compared to PHP or Java, C++ is upto 5x faster and the language is an open standard.
Converting an existing Python webframework into a C++ one is easier than it looks like. The good news is, that the only problem is to create the C++ code itself. That means to handle the pointers, to initialize the classes and to optimize the performance. What the software is doing and how the optical design looks like was fixed by the Python prototype.
3.5 reinforcing vs inhibiting cascade in the peer review proccess
Academic publishing contains of two stages: content production and content evaluation. Content production is a single user activity. A human sits in front a computer and types in a manuscript. Then he uploads the pdf document to a hosting service and starts with the next paper.
Peer review is the opposite from content production. It is working in a group only and has the aim to produce a resistance against individuals which is called mobbing. If a group comes to the conclusion that a paper of an individual is wrong this is equal to a conspiracy against the individual.
Sometimes, Facebook and other social networks are called mobbing infrastructure. Their main goal is hold other people down and to make jokes about the loosers outside the own circle. This behavior is the normal result. If a group doesn't produce mobbing and holds other people down, something is wrong with the group.
From a cybernetics perspective the combination of reinforcing the authors to create more content and prevent them from doing so is very attractive to research in detail. The best practice method in realizing such a cascade is by separate both instances. What does this mean? It means, that a paper can be available in the internet and it doesn't have become a peer review. A scientific document can be located on the Web but the researcher is not part of a larger group. This is realized by a divided infrastructure. The first type of websites is only created for file hosting. A normal weblog or dedicated fileservers are used with this purpose. If somebody has an account on the website he is allowed to upload his document. Then it is available on a worldwide URL.
The second sort of website is called social network. These websites are working independent from fileservers. The task of a social network is to produce a community. If the group members are communicating with each other the social network was successful.
The existing question is how to connect both instances. Building the first type of technology is easy. A file can be hosted in the internet with a normal Apache webserver. If the file is copied into the /var/www directory it is available worldwide. The technology is standardized and it very cheap to build such systems.
In contrast the problem of building a social network is much harder. The amount of example is low, the software is not standardized and there are different opinion out there what a social network is. What we can say, is that the Facebook website is attractive for many million of people. But if Facebook is the right choice for doing a peer review is unclear.
The question is how does a social network look like so that it is fulfill the needs of scientists? The good news is, that all the social networks have something in common. There are working with a special kind of software Facebook is based on dynamic generated website and Google+ is operating with the same feature. So it can be measured in detail what the software is doing and how the people interact with each other. In theory, this allows to say how the perfect social network will look like.
My personal work hypothesis is, that the perfect social network is equal to Wikinews. The RSS feed of existing papers are feed into the Wikinews system and there it's get evaluated by the community. For doing so, the content has to be converted from a paper into a news article. Sometimes this is called academic story telling. But the inner working can be explained more precise. The idea is, that the community of authors who have written a paper have an interest that their content is aggregated and annotated with comments. This is equal to group building. A media wiki installation in which the users are posting URLs to existing content is realizing such ideal.
The hypothesis has a lot of weakness. The major one is, that it wasn't tested in reality. There is no case known in which a wiki system was used as a social network to evaulating academic content. A case which comes close to the idea is Wikinews itself, but they are evaluating only a small amount of content and mostly not academic papers. The idea of posting the URL of academic papers to a Facebook group was made from time to time in the Internet but wasn't researched in detail. The problem is, that Facebook is working different from Wikinews. An edit in Facebook is not recoreded for later investigation and the software is propriatary.
The only available project which is researched heavily is WIkipedia. But, Wikipedia is not a social network but it is a content wiki. That means the content is created in the Wiki itself. As a real life example we have only the following websites:
- Wikipedia
- Wikinews
- academic Facebook groups
Combining the idea of all would result into a Wikinews for Academic papers. The pdf files are stored outside the wiki and in the Wiki only URLs plus comments are stored.
3.6 Creating a link posting bot but why?
From a technical point of view it is easy to create a python script which is posting URLs to twitter, to Facebook, to Google+ or to Wikinews articles. All what the bot has to do is to print the URL from a table into the form and press the submit button. If this action is repeated in a for-loop and a simple delay pause was used, the bot will run 24/7 without interruption.
The more interesting question is, why does somebody should do so? What is the purpose of posting random URL to Wikinews or into a Facebook group. Somebody may argue, that there is no meaning and the only reason is because it is a bot and bots doing so. Perhaps the overall script is useless at all? The answer is bit more complicated, because it won't explain why so many people are fascinated by twitter bots who a are posting images to the internet.
The answer is not located within the individual bot but how social networks are working. We have to abstract from the motivation of a single user and describe the motivation of larger groups. What is the reason why blog aggregators like planet gnome or CSL-theory feed was founded? It was not because of a single user was interested but because the community of all bloggers have asked for uniting the content. If a community of 100 bloggers have produced a large amount of content they are interested in two things: first they would like if the public is visiting their websites and secondly, they hope that the bloggers are commenting each other under the articles. In a short sentence a blogging community has a need for a social network which unites the community.
The surprising fact is, that twitter bots and blog aggregators are fulfill this need very well. Let me construct an example. Suppose there are 100 blogs available about the subject of robotics. Each of the blogs was created by a different author. But the bloggers are not talking to each other and the public isn't interested in the project very much. Now a twitter bot is created. The twitter bot is doing the following. At first, he generates a combined RSS feed from all the blogs, and then he is posting the latest URL into his bot-account. The bot account is called “robotics united”.
Now we have the situation that in a twitter account a bot is posting each day a status update of all the 100 blogs. If one of the blog users likes to see what the other blogs are doing he doesn't need to visit all the blogs, but he can follow the twitter bot and is always informed if in the community a new article was published.
Let us back to the introduction. The initial question was why somebody creates a bot, or why a bot should post a URL list to a twitter account. It was not possible this behavior on an individual basis, the only explanation was that it is spam and the bot should be deactivated. The better attempt to answer the question was to increase the abstraction level to the entire blog community which contains of more than a single person. In this context the twitter bot is producing sense. Even if the python script itself is useless and can be created in under 20 lines of code, the behavior of twitter bot is equal to community building. That means, his posting activity results into a united robotics community. They have a single twitter account in the internet who is well informed about the updates, and the public can communicate with the twitter bot. The bot itself won't answer, but one of the individual bloggers will do so.
Spam or not?
By definition spam is everything which can be deleted without problems. In case of content websites the seperation between spam and non-spam is easy. If information is replicated and contains of random information then it is spam. This definition doesn't work for social networks. By definition, social networks and twitter microblogging service have not the task to deliver content but to improve communication and allow group building. The amount of added information in a content aggregator is low and even ultralow. All the eisting content is put in a playlist, but the playlist doesn't contain new content. The same is true for counted upvotes and traffic measurements. These non-content related information are the backbone of social networks.
With the classical definition social networks at all are equal to spam and should be deleted. A twitter bot how is posting day by day URL doesn't add value. The URLs are known without the bot and the content can be find with a search engine. The same is true for most Facebook groups. What would happen if we treat social networks and content aggregation at all as spam? It would note make much sense, except the idea is to prevent the people from connecting to each other. The better definition is, to assume that spam behavior is only clear for content hosting websites, but is unsolved for social networks.
The problem will become more obvious in the future. If social networks are treated as normal and useful while at the same time the capabilities of AI-bot will increase twitter bots and content aggregators will become an important part of social networks. In some facebook groups this behavior is seen today. From a pessimistic point of view, a Autoposter bot is the most valuable part of a group discussion because he never downvotes users and his behavior is predictable.
But how exactly is the term bot defined for social networks? It is naiv to describe a bot by it's sourcecode. Even if the source code is known this is not equal to the bot. The meaning in a social network is defined by the group. And groups have a certain demand for group building. The question is, if social bot supports this demand or not.
One explanation is that the importance of social networks and twitter posting bots is exaggerated and in reality a group would work without a social network great. To investigate this question in detail it make sense to introduce a well known problem in Academia, called peer review. Somebody may ask what does have peer review to do with social networks. The explanation is, that peer review and content creation is separated. Peer review is similar to curating content which means it is working outside the content creation mode.
Let me give an example. There are 100 scientists available, each of them writes a paper by it's own and uploads the content into the internet. The overall task can be defined as content creation. And the place in which the final pdf file is stored is a document hoster. The interesting question is how has the peer review organized to evulate the content? In the classical world in which no social network exists a peer review is not possible. Peer review is by definition a communicating process between colleagues. If each of the scientists has written his paper alone no group discussion was there.
Let us define what the result of peer review is on a technical level? Is it similar to creating another 100 papers? Now this task is called content creation. Peer review is about sending existing content back and forth, measuring the pageviews and count the up and downvotes. That means, the peer review process doesn'T add aditional content but is equal to group building. Now we can answer the question how exactly peer review is done. Peer review is always located in a social network. A social network is the place in which traffic is generated, likes are counted and one sentence comments were written.
Peer review is working on the meta level. It contains of the following steps. The first one is content aggregation. Which means all the URLs of the pdf papers are combined into a single RSS feed, similar to what is known from the Arxiv feed which available for all subsections of the repository. The second step is commenting the content. Not the paper itself are annotated but the RSS feed. The commenting step is done in the social network software, for example in a Twitter group. A URL posting bot announces the papers, and the users can comment each paper. The last step is to summarize the data of the social network. It is measured how many traffic each paper has reached and how many comments were posted. This allows to publish a ranking. The top paper has reached the most upvotes. The extra effect of the peer review process which is done in the social network is, that the public gets informed. They can follow the bot and they will recognize that a subgroup has made an open peer review.
3.7 Python script for creating fake news
for i in range(4):
runscigen() # SCIgen nonsense generator
uploadpaper() # it is stored at Academia.edu repository
RSStomarkdown() # takes the RSS feed as input and
# produces an article
checkarticle() # the article contains of: title, two references, plus abstract
autoposterbot() # posts the article to Wikinews
wikinewspeerreview() # checks if the article is right
wikinewspublish() # put the headline to the mainpage
3.8 Best practice method for an intranet
From the infrastructure itself, an intranet needs lan cable, webserver and client pc. If all the wires are connected the next question is how the content is created and how the users are working with the content.
Content creation can be realized with the well known industry standards like Microsoft sharepoint, linux webservers, wikis, blogs, forums, network file systems, some external hosting services, an outdated FTP server and additional is the employee allowed to install an adhoc webserver on his “bring your own device”. As a result, the intranet contains of a variety of places in which data are stored and no one known them all.
The next step is to install an intranet fulltext search engine. This kind of tool has a high demand for storage capacity and software features. The robot of the engine will traverse all the servers in the network, and copies it's file to a centralized database. A easy to use interface allows the user to type in a keyword and he gets in under second a list of URLs from the Intranet.
The last, and maybe most important step in creating an intranet is called social intranet. A social intranet is human powered content aggregation system. Technically it is realized with a mediawiki server. The user gets an account on the system and are allowed to post URLs. But only internal URL from the own intranet are desired not links to youtube videos or audiostreams in the normal internet, because the intranet should support the work, and if the employees are watching all the reaggae songs on youtube they will become too relaxed. Addiitonal some RSS feeds are curated by the admins of the social wiki to announce importart URLs from the intranet to the newbies.
Because it's an intranet a premoderation is not needed, instead the user types in a new URL press submit and the edit is online. It is not allowed to post fulltext information into the social wiki. If somebody has written a pdf document he has to put it to a fileserver and then he inserts the link into the social wiki.
Let us summarize the overall setup. On the lowest level there is the hardware which contains of desktop pc, servers and lan cable. On layer 1 there are storage capabilities like blogs, wikis, forum, sharepoint and fileservers. Layer 2 is a fulltext search engine of the entire intranet and on layer 3 a social network is available realized with the mediawiki engine.
Social wiki
Somebody may ask why the users of an intranet needs a dedicated social wiki. Isn't enough if they are posting content to the normal blogs and the normal wikis, which are already there? If somebody want's to write a text he can do, because it gets indexed from the search engine for sure. And other people can search for the document and read it.
The answer is, that the users in an intranet will need a dedicated social wiki for sure. NOt because of the users itself, but because of the content which is stored in the intranet. In layer from the content hosting all the documents are available. Each of them has a URL. And some users will bookmark the URL in their local browser. But, these bookmarks are not visible for a group. What the content needs is called aggregation. That means, a playlist is curated and the playlist is posted to the social wiki. Not because somebody has a concrete question, but because he found in the intranet something which looks interesting and he is dropping the link in the social wiki group.
The combination of a fulltext search engine plus a curated playlist of URLs is a powerful tool for knowledge management in an Intranet. It is predictable that the social wiki will become the most frequent used website in the intranet. Some users will use it more often than even the fulltext search engine. additionally, the traffic of the linked ressources will become much higher if a link is posted into the social wiki. The reason is, that the other team members want's to know, what they should know. Or to make it more clear. If a person explain to another person that he should take a look into a document, the other person will do so. In contrast, if a search engine result list will explain to the user, that one of the documents is very important the user will ignore the advice without consequences. Because a search engine is a machine. And the communication flow is the other way around.
Somebody may ask why the URLs are stored in a wiki. Wouldn't it be better to use a textfile or a forum for such task? No it is not. because a handcurated list of URL can be extended to a newspaper, and creating a newspaper is more easier with the markdown syntax. It is possible to reference to images and format the text a bit.
Such an intranet newspaper will be more attractive than hard to read URL lists. This would result into a more efficient intranet. The social newspaper is somekind of meta-meta section. In the social wiki all the URLs are stored from the user, while in the newspaper the social wiki gets summarized.
User will like the social wiki
A common misunderstanding of wikis and content management systems is, that the ordinary users doesn't understand it. He knows, that the company has somewhere such a system but the user has never used it. So why should the user be motivated to become a user of the social wiki?
The reason is simple, because the social wiki doesn't ask the user to do complicated task. All what the user has to do in the wiki is to post a URL. If he likes he can add a small comment but this is optional. More is not needed. If the user is already logged in into the wiki he can do such a task in under 20 seconds. Most users didn't enter the URL manually but are using the magical STRG+V. Because this task is so easy the users will do so very often. And it will be more than 1-2 users who are specialists for wikis, but the wiki will reach 50% of the users of the intranet.
Let us analyze the user interaction with a social wiki in detail. From a formal perspective the user inserts an URL and submits the edit to the wiki. Then the edit is online. From a higher perspective, the user is communicating with other users. The question is what do you think about a certain topic, do we have the same opinion and if not why? From a formal point of view, the users are posting URLs. But what they are really doing is group building. They are aggregating existing knowledge and existing content into a conversation and this is attractive to other people not involved in the debate already.
Every social wiki which is installed in the intranet of a company will quickly become the topwebsite. On day 1 only one person has an account after a month 10% have an account and after 6 month, 300% have an account which means, the users are trying to trick the admin with sockpuppets and they will start to manipulate the traffic counter.
3.9 Evolution of bookmarking tools
A bookmarking tool isn't about the content of a website but about his location in the internet. It is similar to a library cataloque which doesn't provides books but index cards. The most easy to grasp bookmarking tools are directories in the webbrowser. Each user has it's own bookmarking list. If this idea is made more powerful the URL collections are managed in the internet collaborative.
One option for doing so are blog aggregators. A famous example from within the Python community is planet phython. Planet python is working with the RSS standard. A general website which isn't online anymore was technorati. This hub organized bookmarks from very different topics. The next more advanced technique to organize bookmarks are social networks. The so called share-button creates a bookmark in a public browsable group.
Social networks are only one step in the ongoing ladder to find the perfect collaboration tool. The evolutionary next step after a social network is a wiki based social network for example wikinews. The users are submitting bookmarks to wikinews and other users are buliding on top of the bookmarks a story. In contrast to propriatary social networks, the version control history shows to everyone what the admins are doing behind the scene. This is important in case of deleted content and reverted actions.
Even WIkinews can be improved much more. What the today's wikinews doesn't provide is a bot-friendly environment. From a technical point of view it is possible to submit a bookmark with a script. And another computer program can evaluate the URL. A simple example for a incoming control bot is a python script which asks, if a submitted URL is available in Google news already. If not, then the source is not serious.
Somebody may ask why bookmarking tools are important if all the available content is stored in a fulltext engine. The problem with fulltext indicies is, that they are not providing sense. What is missing is a ranking, which of the content is valuable and which not. This can be answered by personal recommendations, amount of links to a ressource, traffic analysis, number of comments and so on.
In the today's landscape, the Google search engine is trying to integrate some of these annotations in the search engine. What Google can't provide are real humans who are clicking on the links. The ability of an automated search engine is limited. There is a need to extend search engines with social networks.