Robotics and Artificial Intelligence: Wikinews is the weakest website in the internet

The large Wikipedia project is well known in the public. It is referenced as a quality source of information and many thousands volunteers are motivated to contribute to Wikipedia. The normal way in doing so is to write articles and extend existing information. Because the amount of traffic is high, Wikipedia is also attractive for spambots who are posting false information and are deleting content from Wikipedia. This is prevented by a hierarchy of admins who revert edits, ban users and protect articles.

In general the amount of spam in wikipedia is handled without problems. The admin will win because it is easy to detect spam and make it undone. And writing longer articles which are nonsense is difficult so the spammer have no motivation in attacking Wikimedia much harder.

A side project of the WIkimedia foundation is much more vulnerable to attacks from outside. It is called Wikinews and most people never head of it. Wikinews is a social network which can be flooded with nonsense information and marketing URLs similar to what an Autoposter is doing in Facebook. The difference to Wikipedia is, that spam can be detected in WIkinews not efficient. The reason is, that writing a nice looking Wikinews article is much more easier than writing a wikipedia article. It is possible to use bots to generate thousands of wikinews article. The important feature of these articles is, that they can not detected as spam. If a wikinews article contains of 3 sources from Google news plus a short abstract it will look like a valid news article. The admin won't recognize that it was spam.

It is only luck, that spammers didn't recognized this weakness right now. They are ignoring Wikinews or thinks it is not a valuable target. The problem is, that it is only a question of time, if automated bots are generating wikinews articles on the fly.

In contrast to WIkipedia this content can't be detect easily as spam. The problem is the same like at Facebook. If an autoposter is submitting a URL to Facebook the admin will not stop the bot, because the contribution is valid. The problem of a news aggregation website is, that it doesn't provide content but it's main value is the URL to external content. And the external content for wikinews is already available and it is valid.

Let me give an example. In the google news database at least 2000 URLs are available which are linking to valid sources from the last week. If a bot will post these URLs piecewise to Wikinews and adds a small abstract to the article, the wikinews admin will think it is a valid contribution. He is not able to stop the bot. If he would like to delete such articles he has to open a deletion request which takes to much time.

IN contrast to the normal Wikipedia, a spambot can win the fight at Wikinews. Winning means, that he is able to flood the system with fake news and won't stopped by the admins. Similar to an advanced Facebook autoposter the bot will produce a minimum noise and if he is successful other bots will copy this behavior.

Let me explain the inner working of Wikipedia. The current encyclopedia is well protected against bots. If somebody tries to produce automatic generated content this will detected fast by the admin and he will ban the user. The reason is, that an automatic article generated by a bot, can be identified as computer generated. The entry barrier to post something on Wikipedia is, that unique content on an academic level was produced.

In case of WIkinews the entry barrier is much lower. Wikinews isn't asking for longer content, but Wikinews asks for a valid URL to external content. Generating a URL list with a bot and adding a nice looking headline is an easy task. And it is not possible to decide if such an URL was produced by a human or a machine. This makes Wikinews extremely vulnarable for bot attacks. The danger is high, that a bot can produce hundred of Wikinews articles without being detected as a bot.

The reason why is located in the idea of WIkinews. Wikinews is in contrast to Wikipedia not a a content source but a meta website. It is collecting URLs of content stored somewhere else. It is not possible to upload a full text to Wikinews, but the fulltext has to be available already. Wikinews is working only as a content aggregator.

The difference is that aggregated playlists can be produced without human intervention. For example the planet gnome proejct was generated entirely by a script. Wikinews is equal to a frontpage. And this frontpage can be easily fooled.

I'm sure that the normal Wikipedia project is protect against vandalism. Each day a lot of spam attempt are made and all of them were detected by the internal system. IN case of WIkinews i'm very pessimistic if this is possible. In case of doubt we have to assume, that Wikinews is working on the same principle like a Facebook group. And facebook groups are spammed with Autoposting software permanently. It is not possible to ban such software because an autoposter is a great content aggregator. The only option to ban a Facebook autoposter is, if the posting frequence is too high. But the spam bots have learned this rule quickly and all of them are posting in the delayed mode to fly below the radar.

In theory the same principle can be used to flood Wikinews. The only difference is, that Wikinews needs the markdown syntax and it needs URLs from large newspapers. If a well formated article which contains newspaper links is submitted to WIkinews the propability is high, that this article will pass the incoming control.

The problem can't be overcome. The only reason why Wikinews is not spammed more often is because the project is very small and most people in the Internet didn't know what news aggregation is. They think that Wikinews is a place in which content is generated. No it is not. Wikinews aggregates content which was created already. There is no need to write a single sentence to add a newsheadline.

The only option to avoid this problem is to redefine the purpose of the project. Instead of becoming a frontpage to existing content wikinews can define that unique content has to be posted. But then it is not a frontpage anymore, but some kind of newspaper similar to the Guardian. The self-understanding of Wikinews is, that they are a hub for different newspapers. Become a hub and hosting unique content is the opposite. Wikipedia for example is a content hoster. It is allowed and desired to upload longer text to Wikipedia. Putting the URL at the end of a wikipedia article is mandatory, which means there are lots of articles in the system without a reference. And adding the reference to existing articles is also not a big problem.

In contrast Wikinews has a different priority. Before something can published there it must be available in the internet. Sure, this sounds a bit confusing. Let us compare a wikipedia article with a news article.

If somebody creates a wikipedia article about neural networks he can write any thing in the article. In the first draft he can leave out any ressources, because neural networks can be explained without a reference. Before the article can be sumbitted to Wikipedia it is recommended to add some references. Some references are added to the article which are providing additional information and the article is ready for submission. This workflow is the best practice method in creating Wikipedia content, all the authors are working in this way.

In contrast, social networks and espeically Wikinews are working different from this mode. THe most important part of a posting is the URL which comes first. From the URL the headline is extracted and from the fulltext the abstract is extracted. The final text is equal to a facebook group post or a wikinews post.

In short this was the main difference between content creation and content aggregation. What it is important to know is that the needed manpower for content creation is high. Writing a 2 page USl-letter article which fullfills academic standards can take up to a week. In contrast, creating a list of 100 URLs with the latest news headlines can be made in under 10 minutes, and with the help of script in around 0.2 seconds. The amount of needed work is a protection against bots. If it takes one week to create a wikipedia article it is not possible to mass produce or mass spamming such content. On the other side, URLs to existing content can be produced on a mass scale without problems.

The funny thing is that each day hundreds of spammer are trying to infiltrate Wikipedia. They are doing test edit, vandalism edits and producing not wanted content. In contrast, Wkinews is ignored by the spammers. The total amount of articles is low the amount of daily edits is very low. The paradox situation is, that a website like Wikipedia which is protected against spam is attacked hundred times a day by spammers, while a website which welcomes any kind of URLspam is not attacked by anyone.

Right now I have no explanation for this phenomena. One possible explanation is that the fake Tutorials within the WIkipedia are working great. Which means, they are teaching the public misinformation about how Wikipedia and Wikinews is working internally. And the public beliefs the wrong information and this results into inefficient attack vectors against wikipedia. Perhaps the public beliefs, that the Wikipedia website itself can be easily fooled and the admins have no idea what is going on, while on the other hand they have no idea what Wikinews is, or how to utilize this shiny social network individual purposes.

Content wiki vs. Social wiki

There are two different forms of WIkisystems available. Both are working with the mediawiki system. The easier to explain wiki is a content wiki. Wikipedia is a typical example, but the fandom Wikia wikis are good example for content wikis as well. The idea is that the users are uploading text and images to the wiki. And the admin has to make sure that the users are doing so. A content wiki is easy to maintain, because it is easy to say if a contribution is content or not.

Social wikis are operating with a different guideline. The mediawiki installation is the same, but the guideline for the admin is different. A social wiki is not asking the users for content but for URLs links. What the users are doing in a social wiki is to post a link, to comment the link of a different user and to write very short abstracts for a link. Social wikis are sometimes called meta-wikis because their purpose is to aggregate existing content.

It is important to know that it's hard to control a social wiki. Because the desired input is usually short, contains of a single URL and can be generated by humans or bots. The only option to detect bots in a social wiki is their posting frequence. IF a bot is posting 100 URL in an hour the bot and the user behind him can be banned. But, if the bot was adjusted similar to a Facebook autopost bot and posts each day exact 2 URLs it is not possible to ban the bot. Even if the admin knows that it's a bot he can't forbid him for doing so.

The only option the admin has is to redifine the entire wiki. He can argue, that the wiki is no longer a social wiki but a content wiki. which means that only longer content without url is allowed. This would force the bot to stop his actions, because a python script is technincal not able to produce longer content.

Let me summarize the facts:

content wiki, contains of longer text, doesn't need URLs, can only be filled by humans, can be easily protect by admins

social wiki: contains of URLs plus very short comments, can be filled by humans and bots as well, is very hard to protect by admins

Perhaps it make sense to give some details about social wikis. The important information is, that a social wiki can be generated without human intervention. An example is the planet gnome project. This is not directly a wiki but it's a content aggregator. Planet gnome is the result of a python script which aggregates RSS streams. Forcing the planet software not to produce HTML code but markdown syntax which is inserted into a wiki is a simple task. The funny thing is, that social wikis doesn't needs human in the loop. What social wiki needs is content outside of the WIKI. If no content is available, no URL can be added which is referencing to the content. The social wiki itself can be generated with a script.

And this explains why social wiki are vulnerable to bot spamming. Because a spambot who is dropping URL to a social wiki is not the exception but it is the normal case. If a real human but not a bot is posting an URL, this is a case for flagging the post. Let me give an example. In the standard mode, the planet gnome website is generated by a script. No human intervention is needed. The content displayed on the site is 100% the same content which was delivered by the RSS streams. What will happen if a posting is showed on the screen which is not available in the RSS stream? Right, this is not possible. If this case happens, than a hacker has attacked planet gnome and injected a posting into the system manually. That means, the normal behavior of the planet script is to convert RSS feeds into HTML. And if additional content is posted a person from the outside has interrupted the planet script or bypassed the generator.

The same behavior is visible in other social networks like Facebook and Google+. The normal noise in the groups is produced by the bots. They are producing 1-2 messages each day. If a user posts a longer message to the group or comments one of the bot postings, this is flagged and should be investigated by the admin. The reason is, that the behavior of the bots is predictable, while a human user is intelligent which is dangerous.

In content wiki all the message produced by bots is detected as spam while most messages of humans are detected as valid. In a social vwiki the situation is the other way around. All the submissions from bots are passing the filter, while postings from users are treated as problematic.

Robotics and Artificial Intelligence

May 10, 2019

Wikinews is the weakest website in the internet

Content wiki vs. Social wiki

No comments:

Post a Comment