October 29, 2019

How to control online communities lies Stackoverflow, Wikipedia and even Arxiv

In the domain of robotics there is term available called “underactuated system”. It describes a control problem which is working passive. In the reality there are two sorts of ships available, he first category is driven by a diesel engine. The captain defines the direction of the boat. In contrast, a sailboat can only controlled partly. The next action depends on the environment. That means, the wind direction and the action of the captain have to understand both.

The technical term for controlling these system is model predictive control. It means, that the user can't bring the system directly into a goal state, but the more interesting question is what the system will do without the users action. Let us given an example. If the wind isn't blowing, the sailboat won't drive forward. The captain is not able to give the command “move ahead”. The same principle is working in online communities in the internet. What a single user can do is very small. The more interesting question is, what the overall community will do. Controllling such systems depends on a realistic prediction. If somebody is able to describe an online community in advance he is able to control the system. He can answer what-if questions.

What makes the situation more complicated is, that most users have a certain awareness what will happen next, but this knowledge is made explicit. It's stored in the mind which makes it hard to reproduce the steps. For realizing Artificial Intelligence, the ability to reproduce knowledge is very important. Only wisdom which is stored in a computer program can control a system. If the aim is to describe the inner working of online communities with an AI perspective, there is a need to formalize the inner working.

In the easiest case this is done with a walk through tutorial, known from a computer game. This is fulltext information written for humans which explains them what to do next in a game. Writing such a tutorial for online communites like Stackoverflow and other is hard, but possible. One interesting question is who decides about the rules on these websites. The interesting answer is, that the existing rules are not describing the inner working of an online community. But they are only the help section of provided by the website themself. That means, in the WIkipedia help section large amount of text is provided which explains to the newbies what they should do and what not. All these guidelines are correct in the sense, that it's the official wikipedia help section. But apart from the help section there is a need to provide an additional tutorial which describes the inner working more realistic. A typical example why this is needed are edit conflicts. In the wikipedia help section the possibility of an edit conflict is described only vague. According to the help section, the users should avoid in producing such conflicts and if they are unsure they are allowed to ask.

But, this description doesn't describe the phenomena from an academic point of view. The better source for information are existing academic papers with the title “edit conflicts in the wikiipedia plus an extensive literature list with 100 entries”. Such a paper is the more elaborated way in describing the Edit conflicts.

To get a realistic understanding of how an online website is working, it make sense to observe the situation from a neutral point of view. Suppose, a newbie not familiar with the project already, sign in for a new account and is posting his first text. What will happen next? It's the obligation of the tutorial to explain what will happen then. I'd like to give an easy example. Suppose asks a question about the Java programming language at Stackoverflow, but he has chosen the tag C#. What will happen next is for 100%, that an admin of the website will change the tag to the correct one.

Or let me give a second example. Suppose, a programmer from Japan asks a question at the normal stackoverflow website in japanese. It's for 100% sure, that the post gets deleted because the only allowed language is English. It's possible to describe what-if scenarios from lots of domains. What all the use cases have in common is, that the fictional newbie user isn't give orders to the admin. He doesn't even know them. Instead he is doing his action and the Stackoverflow admin will do his job. That the reason why such systems are called partly controllable. On the first hand, the user has no control over the situation but at the same time he has fully control.

The amount of control which can be achieved is correlated to the understanding of an online community. Expert users of WIkipedia know in detail what the opponent will do with a certain edit, while newbies have no such prediction model. That's the main reason why some users are successful at Wikipedia while other not. The reason is not located within a certain user account, in the sense that posting from a concrete long term users are accepted. But it depends on the prediction model of the user.

That means in detail: it's not relevant if content gets posted from the normal account or from an anonymous user. Wikipedia and other website will act the same way. What is important is the question which kind of edit are generating a sense and which not.