Robotics and Artificial Intelligence: 2026

February 14, 2026

Language as ghost in the machine

The term "Ghost in the machine" is usually referencing to artificial intelligence which allows a robot to do useful things. The body is the robot's hardware and therefor the software is soul realized in software. With such an understanding, Artificial Intelligence is an advanced computer program based on AI related algorithms.

But what is if this working thesis is wrong? The assumption is, that AI can't be realized in hardware and also not in software. But Artificial intelligence is similar to a ghost very hard to identify. Its basically natural language. In other words, the ghost is an English dictionary. Its not stored in the software itself, because language is a communication pattern used as in between technology.

Natural language will become only visible if a human speaks with a robot. The human formulates a request like "move forward" and the robot responds to this request. Therefor, the AI isn't stored in the robot itself, but its located in air between human and robot. This would explain why it has much in common with a ghost which also has no measurable location in the reality. But a ghost is located beyond, which is the environment of the reality, or some sort of hyperspace.

Let us try to describe language from a more scientific perspective. A statement like "grasp the apple, please" has no physical size. Words are not part of the visible reality, but they are abstract symbols located in the oral space. Despite this missing physical borders, language is part of the reality because language is used for many purposes. In a mathematical sense, language is a communication technology used in a sender to receiver interaction. Such a protocol has much in common with a ghost like behavior. If language gets submitted over radio waves it has some similarity to a supernatural phenomena.

What makes language interesting for artificial intelligence is, that without language a robot is not able to think. Without language, robots are reduced to a pocket calculator which can execute an algorithms but isn't understanding the meaning of objects in the reality like a table, an apple or a plate. The ability to parse natural language is equal to implement artificial intelligence.

February 11, 2026

Playing pong videogame with a perception buffer

In addition to the previous post here is another example with a working AI based on a buffer. The game has two modes: normal mode which runs the simulation and a pause mode which shows AI information including the perception buffer, the action buffer and the predicted trajectory. The text box shows the known information from the game engine which are the ball position, its velocity and other information. the action buffer stores information what to do next. This information is submitted to the paddle.

Because of the simplicity of the pong videogame, the AI master the challenge with ease, it will move the paddle towards the correct position.

From a technical perspective the buffer was realized with a python dict for storing the information in a key/value syntax. Creating such a dictionary and showing the content on the screen is very simple. The innovation has to do with the assumption that such a buffer modulates the communication process. The AI brain isn't imagined as a sophisticated algorithm, but its a database which holds information as natural language. This will generate multiple subtasks like a) how to convert the game state into a perception buffer b) how to translate the perception buffer into the next action and c) how to submit the content of the action buffer back to the game engine.

Computer programming vs. AI programming

Computer programming is the art of software creation. It has to do converting a real world problem into executable program code like Java or C/C++- A typical example is to program a pong videogame, or improve a database management system.

Modern computer programming since the 2010s does't reinvents the wheel but its using existing operating systems, programming languages and libraries. For example videogames are written with the help of a 2d game library, and database systems are created on top of existing SQL databases.

Programming has always the goal of creating software and modify existing software which is running on a computer. All the modern technology like the Internet, word processing software, and database software is the result of well engineered software applications.

Despite the importance of programming in computer science the discipline has a blind spot because its not possible to program an AI software or write a software for a robot. Many attempts in writting robot software in C/C++ and Java were presented in the past, but most of them have to be called a failure. It seems, that artificial intelligence is working different from classical software engineering principle. Its not possible to reuse existing software libraries or take advantage of existing programming languages. Even the most powerful programming language avaialble which is Python in combination with the latest mathematical libraries is useless for realizing a robot project. The reason is, that software programming describes the world as computer centric. The attention is always directed toward a computer and towards its ability to execute a software. For example the Python interpreter provides a list of commands. Programming means to arrange these commands to a fixed structure which is a computer program, namely in classes in subroutines. Then the program canb e exucuted. The problem is that such a program won't realize artificial intelligence.

There is a single programming excersise available which demonstrates the transition from classical software programming towards artificial intelligence which is activitity recognition in motion capture. This specialized problem has its roots in computer animation and was first mentioned in the 1970s. The task is to annoate the movements of the mocap markers with textual names like sitting, jumping, walking and so forth.

Computer programming is focussed on the CPU of a computer. The computer has to solve a problem, e.g. adding two numbers or search in a database with a search algorithms. In contrast, the activity recogntion task works with a communiation paradigm similar to an internet protocol. The idea is to convert low level data into high level data. Such a communication system is an open system, which is seldom described in the programming literature. The reason is that communication is referenced to external parties located outside of a computer.

Classical programming works with the algorithm paradigm as a theoretical understanding. The algorithm is executed on the machine and solves a problem. In contrast, communciation oriented programming works with the sender to receiver paradigm. There is no algorithms needed but there is a message which is delivered over the network. Programming a robot is similar to implementing a communication protocol, there is also a sender, a receiver, a message and a protocol. And the robot never runs an algorithm, but the robot receives a message.

February 09, 2026

Roboter steuerung mit der DIKW Pyramide

Obwohl die DIKW Pyramide in der Literatur häufig diskutiert wird, ist ihre Anwendungsmöglichkeit innerhalb der Robotik nur selten dokumentiert. Als motivierenden Einstieg hier ein Beispiel für einen Warehouse roboter. Auf der untersten Ebene (Daten) fallen folgende Messwerte an:

- Geokoordinaten: "X: 194.5 / Y: 10.2"
- Prozentwert: "12%"
- Barcode scan: "ID: 00056789"
- Temperatur eines Servomotors: "42°C"
- Geschwindigkeit: "0.2 m/s"
- Sensor Schaltzustand: "Bit 1 = On"
- Zeitstempel: "12.05.2025 / 10:02:01"

Diese Daten sind Rohdaten wie sie von Sensoren ermittelt werden, also über gps triangulation, barcode reader oder von einem Temperatursensor. Eine tiefergehende Bedeutung haben diese Daten nicht, sondern sie werden nur mitgeloggt und in einer Datenbank als numerische Werte gespeichert.

Zur Steuerung des Roboters interessanter ist die nächste höhere Ebene der DIKW Python: Information.

- Batterieladezustand ist gering, bezug zu 12%
- Standort des Roboters ist Regel 8, Fach A. Bezug zu Geokoordinaten
- Paket mit 00056789 ist eine Palette mit Glasflaschen, Bezug zu Barcode scan
- Motorüberhitzung droht, Bezug zu 42°C

Die Zuordnung von Daten zu Information erfolgt mit Hilfe von weiteren Datenbankeinträgen. Darin sind Textinträge gespeichert wie "Motorüberhitzung droht" und Bedingungen wann diese zutreffen. Die Informationsebene ist nicht als numerische Daten gespeichert sondern besteht aus kurzen Sätzen in natürlicher Sprache. Höhere Ebene in der DIKW Pyramide beinhalten abstraktere Formulierungen die Expertenwissen beinhalten und für die Aufgabe des Roboters wichtig sind.

Technisch gesehen ist eine DIKW Pyramide ein DAtenbank-MAnagement system, worin die Daten/Informationen auf unterschiedlichen Tabellen verteilt sind und über Regeln zusammengefügt werden. Der Inhalt der Datenbank wird in Echtzeit aktualisiert. Auf der höchsten Ebene (Wisdom) ist die Steuerung des Roboter dann sehr simpel. Man sendet einen natürlich-sprachlichen Befehl wie "Fahre zum Regel C und hole die Glasflaschen und bringe sie zu Regel B". Dieses High level Kommando wird dann übersetzt in konkreten Befehle an den Roboter.

February 07, 2026

Robot control with a DIKW pyramid

Symbol grounding is about moving down and moving up along a dikw pyramid. This allows to hide the details and expand the details of a subject. For the example of a warehouse robot the dikw pyramid can be implemented as a python dictionary which shows only the upper layer and the bottom layer:

dikw_pyramid={
"wisdom": {
"Go to the loading bay and clear the blockage.",
},
"data": {
"lidar_dist": 0.5, "weight_kg": 25.0, "coords": (12.4, 45.8)
},
}

The raw sensor data are feed into the data layer and are formmated as numerical values. In contrast the wisdom layer of the pyramid stores the voice commands formulated in English sentences. The task for the symbol grounding engine is to translate between these layers. This is realized by instruction following (from top to bottom) and activitity recognition (from bottom to top).

February 06, 2026

Robot swarm builds a house

The picture shows multiple robots on a construction site who are controlled by a large language model over a longer time span on the same goal. The AI technology is based on natural language for reducing the state space drastically. The Large language model describes the project in English nouns and verbs and the single robots are converting the commands into physical action.

Database for npc quest generator

NPC quests aren't generated with algorithms but with a database. The database contains textual elements, in case of a warehouse robot a mini database looks like:

warehouse_db = {
"items": {"SKU-01": "Lithium Batteries", "SKU-05": "Hydraulic Fluid"},
"zones": {"Zone_A": "Cold Storage", "Zone_B": "Hazardous Materials"},
"quest_types": ["Fetch", "Escort", "Cleanup", "Security Patrol"]
}

Such a database modulates the communication. An NPC quest like "fetch Lithium Batteries from Zone_B" is valid because the words are available in the database. Another possible quest taken from the former minidatabase is "security patrol Cold storage". Increating the intelligence of the robot doesn't mean to invent advanced algorithm but to populate the database with more entries. If the robot knows the names for important items and relevant locations in a warehouse he is an expert for the domain.

February 03, 2026

Annotating video games

The screenshot left shows a simple random walk in a path with two robots. Even if the picture is provided in maximum resolution it remains unclear what the meaning is of all these pixels. Human can guess that the connected nodes are the allowed path, but computers have no idea how to interpret the image.

The situation becomes much clearer by activated the pause mode shown on the right. There is an additional textual window which explains, that the red circle is robot1 who is moving from node #4 to #5 and has a full battery. These information can't be parsed from the original picture so the text box provides additional meaning. Another feature of a text box is, that computer will understand the information much easier because all the data are formatted in a key value syntax which is the prefered layout for machine understanding.

Such a text box is the core element of Artificial intelligence because it adresses the symbol grounding problem. The text box communicates the current game state to an external instance which is a human observer. Instead of analyzing how the simulation was programmed internally the new question is how to talk about the domain in natural language. Such a task is realized with a user interface in general and with a text box in detail.

Simple example for a head up display

An entry level example for demonstrating the power of head up displays and grounded language is a route navigation problem which is perhaps the most easiest example for instruction following. The robot gets controlled with a random generator and after pausing the game, a text box with additional information on the screen. This text box contains of the grounded language which is important to provide meaning.

Every head up display is based on a two tier architecture: there is a graphical screen in the background and a textual screen in the foreground. Such kind of text boxes are common design element in videogames, and they are also useful for artificial intelligence. The compact representation in the text box helps a computer to understand a videogame.
Grounding means, that the AI is able to generate and format the content in the text box.

The text box is updated if the video game status is changing. Both layers are synchronized automatically. Programming such an upto date grounded language is the core problem. In case of the graph traversal robot, the information shown in the text box are easy to format. In case of a kitchen robot or a self driving car the text box contains more complex information which are harder to maintain automatically.

January 29, 2026

Pong AI with internal teacher

The game has 2 modes: a) normal videogame instructions are executed by the AI b) internal teacher, game is paused and text overlay is shown. A press on space toggles between the modes. This two mode system emulates a speaker to hearer interaction. So there isn't a single Game AI available which controls the paddle, but there are 2 layers which have different obligations.

Künstliche Intelligenz durch offene Systeme

In der Geschichte der Informatik waren offene vs. geschlossene System nie ein Thema. DIese Kategoriesierung wurde niemals genutzt um über Roboter zu philosophieren, stattdessen wurde der Diskurs anhand des Algorithmusbegriff geführt. So unterscheidet die Informatik zwischen sampling-Algorithmen, heuristischen Algorithmen und backtracking algorithmen.

Die Beschreibung von offenen Systeme ist jedoch um einiges mächtiger als der eher mathematisch orientierte Algorithmusbegriff. Ein offenes System ist eine Form der Orakel-Turing-Maschine wo also neben dem Computerprogram noch ein zweite höhere Instanz existiert die man bei problemen konsultiert. Als Folge entsteht bei offenen Systemen eine Interaktion zwischen dem Hearer der eine Aufgabe lösen soll und dem Speaker, welcher Anweisungen gibt. Diese interaktion lässt sich innerhalb eines Computersystems modellieren ähnlich wie ein netzwerk-Protokoll-Stack und dient dazu Probleme im Bereich der Künstlichen Intelligenz zu lösen.

Die konkrete Implementierung lässt sich am Beispiel eines Lagerroboters erläutern. Dieser navigiert in einer grid map von 100x100 Zellen und muss dort pick&place aufgaben ausführen. Der Lagerroboter ist der Hearer während der Speaker mehrere NPC Quests formluiert wie "fahre in Raum B und nehme dort die Palatte #2 auf." oder "bringe die Palette in Raum A und danach fahre zur Ladestation".

Das besondere an offenen Systemen ist, dass der Anweisungen von der Ausführung getrennt sind. Der Speaker formuliert nur die NPC quests, er denkt sich also Aufgaben aus die Sinn machen. Der Speaker kann jedoch nicht selber diese Aufgaben lösen sondern sendet diese an den Hearer. Dieser Hearer-Roboter übersetzt eine Aufgabe in Servomotor-Aktivitäten und fährt zu einem bestimmten Raum in dem Lagerraum oder lädt die Batterie an der Ladestation.

Programmiertechnisch ergeben sich dadurch mehrere Herausforderungen: a) welche Art von NPC quests machen sinn im Kontext eines Lagerroboters b) wie kann eine konkreter NPC Quests ausgeführt werden? Diese Probleme sind in der Art relativ anspruchsvoll und müssen mit klassischen Algorithmen der Informatik gelöst werden. So könnte eine aufgabe wie "fahre zu Raum B" mit Hilfe eines path planning algorithmus gelöst werden, wie A star.

Die Kommunikation in offenen Systemen ist dabei kein Selbstzweck sondern erst durch die KOmmunikation kann man eine Aufgabe maschinenlesbar gestalten. Indem der Speaker an den Hearer eine Textbotschaft sendet, entsteht eine Interaktipon welche sich in Logfiles speichern lässt. Aus den Logfiles kann man ablesen, ob eine Aufgabe gelöst wurde und wenn nicht warum nicht. Darüber lässt sich dann die Künstliche Intelligenz verbessern. Vielleicht ein Beispiel: Angenommen ein Roboter bleibt vor einem Regal stehen ohne erkennbaren Grund. Wäre das System ein geschlossenes System ohne Kommunikation mit einer höheren Instanz bleibt unklar woran es liegt. Vielleicht ist es ein Hardware-Problem, vielleicht ist die Software abgestürzt, oder vielleicht ist die KI defekt.

Wenn es hingegen ein offenes System ist, braucht man nur die letzten Nachrichten auswerten die vom speaker and den Hearer versendet wurden und schon lässt sich der Fehler eingrenzen. Wenn z.B. die letzte Nachricht war "Speaker: halte an". DAnn ist das Stehenbleiben des Roboters kein Fehler weil er den Befehl des Speakers umgesetzt hat. Die Frage ist dann lediglich warum der Speaker dieses Komando gesendet hat.

January 26, 2026

Improved chatbot for a kitchen robot

In addition to the previous post, the python script was improved a bit. There are more entries in the database, the amount of informaiton is higher, and very important a telemetry mapping function is available. This allows to monitor a teleoperation robot. The amount of codelines was increased to 80 but the software remains easy to understand.

The core element is a database with words. Every word is described with additional key-value informaiton for example a picture or a position. The AI takes the current sensory data and searches for a match in the database and the AI also searches for a text input from a user. If the AI has found an entry in the database its equal to understand a situation. In short, the AI is a database lookup algorithm. Here is an example interaction and of course the source code written in Python3.

----
gathering telemetry ...
attention near apple
robotpos near table
user: lookat table
search database ...
lookat action inspect object
table {'pos': (0, 0), 'desc': 'place for storing objects', 'word': 'noun'}

gathering telemetry ...
attention near apple
robotpos near table
user: grasp apple
search database ...
grasp action take an object
apple {'pos': (10, 3), 'word': 'noun', 'category': 'fruit', 'desc': 'is food to eat', 'filename': 'apple.jpg'}
----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
application: Teleoperation monitoring
"""
class Chatbot:
def __init__(self):
self.data={
# verb
"open": "action open something",
"grasp": "action take an object",
"ungrasp": "action place object from hand to world",
"eat": "action eat food",
"lookat": "action inspect object",
"walkto": {
"word": "verb",
"category": "action",
"desc": "move towards location",
"motor": "legs",
},
# noun
"apple": {
"pos": (10,3),
"word": "noun",
"category": "fruit",
"desc": "is food to eat",
"filename": "apple.jpg",
},
"banana": {
"desc": "noun food",
},
"table": {
"pos": (0,0),
"desc": "place for storing objects",
"word": "noun",
},
"fridge": {
"pos": (1,0),
"word": "noun",
"status": "closed",
"category": "furniture",
},
"plate": "noun food is served there",
"door": "noun entrance to room",
}
self.telemetry()
self.parser()
def getdist(self,p1,p2): # return: manhattan_distance
result=abs(p1[0]-p2[0])+abs(p1[1]-p2[1])
return result
def telemetry(self):
self.sensor={
"robotpos": (0,1),
"camera": "cam02.jpg",
"attention": (10,3),
}
# search robotpos and attention
print("gathering telemetry ...")
for i in self.data:
if "pos" in self.data[i]:
dist=self.getdist(self.sensor["robotpos"],self.data[i]["pos"])
if dist<=1:
print("robotpos near",i)
dist=self.getdist(self.sensor["attention"],self.data[i]["pos"])
if dist<=1:
print("attention near",i)
def parser(self):
line=input("user: ") # manuel input
line=line.split()
print("search database ...")
for i in line:
if i in self.data:
print(i,self.data[i])
else:
print(i,"not found")

if __name__ == '__main__':
c=Chatbot()

January 25, 2026

Chatbot for a kitchen robot

Artificial Intelligence is not the result of a sophisticated computer algorithm, but it is produced by human to machine interaction. An easy to implement example is a chatbot which is demonstrated for the domain of a kitchen robot. The software was written in 35 lines of code in python and consists of a mini database plus a parser. A command send to the chatbot is searched in the database and found matches are shown on the screen.

From a technical perspective, the chatbot is trivial. It doesn't contain of complex routines and even beginner in the Python language will understand how the program works. The more advanced subject is to explain why such a program is required for a robot control system. The idea is that the AI is stored in the database which is a vocabulary list. For each word e,g. the noun "apple" additional information are provided. In the chatbot the words are linked to a simple definition, but the database can be enhanced with a longer description and even a filename to a picture which shows a photograph of the noun. Such kind of database allows a computer to understand the meaning of a sentence. A sentence like "eat apple" is converted by the parser into a full blown description of the workflow. The software knows that the sentence is referencing to an action word and an object and it knows that it has to with eating food.

-----
user: open door
parsing ...
open action open something
door noun entrance to room

user: eat apple
parsing ...
eat action eat food
apple noun food
-----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
"""
class Chatbot:
def __init__(self):
self.data={
# verb
"open": "action open something",
"grasp": "action take an object",
"ungrasp": "action place object from hand to world",
"eat": "action eat food",
"walkto": "action move towards location",
# noun
"apple": "noun food",
"banana": "noun food",
"table": "noun place for storing objects",
"plate": "noun food is served there",
"door": "noun entrance to room",
}
self.parser()
def parser(self):
line=input("user: ") # manuel input
line=line.split()
print("parsing ...")
for i in line:
if i in self.data:
print(i,self.data[i])
else:
print(i,"not found")

if __name__ == '__main__':
c=Chatbot()

January 24, 2026

Programming a symbol grounding engine

The core element is a database which consists of flashcards. Entries in the database are natural language words for [apple, banana, plate, table, ...] and also for verbs like [grasp, walkto, use, ungrasp, ...]. The symbol grounding engine works like a parser for a text adventure: a certain input on the terminal like "grasp apple" is matched with the database. The found entries in the database are extracted and converted into action signals for the robot hand.

In other words, there is no AI algorithm needed, but there is a word database. The database ensures that the computer understands natural language commands like "walkto table, locate apple, grasp banana".

January 23, 2026

Creating an internal teacher with natural language

Natural language is a powerful tool for humans to describe the reality. THe existing vocabulary can be reused for robotcs applications. The only bottleneck is, that computers can't parse natural language directly but need a parser and an ontology for doing so. The following blog post explains the idea for a kitchen robot.

The starting point is an ontology which is realized as a python dictionary. The ontology stores items in a kitchen and possible actions for these items.

--------------
items:
apple, food
banana, food
plate, dishes
pot, dishes
action
prepate_meal, search(food)+eat(food)
cleanup_kitchen, search(dishes)+search(trash)
--------------

The system is realized as a command line prompt which asks the human operator to enter commands. A possible session is shown next:
$ apple
> apple is food, location is (10,1)
$ milk
> not_found
$ banana
> banana is a food, location is (20,4)
$ prepare_meal
> search(food) -> found: apple,banana
> eat(food) -> eat(apple), eat(banana)

The parser takes the human input and searches in the ontology for a definition, for location and for actions. Its some sort of text adventure game which works also with a parser and a database. If the user enters a command or object name which is not available in the database, the parser will answer the request with an error message. In other words, the intelligence of the AI doesn't depend on the parser itself but its the result of well populated database. Somebody has enter all the kitchen items into the database to make the system highly responsive.

January 22, 2026

Head up display with grounded language

In science fiction movies of the 1980s, a head up display (HUD) is shown. This allows the audience to see what the robot might see. It was assumed, that such a head up display doesn't has a purpose but its only a special effect. An example for such a head up display is shown on top of this blog post. There are multiple food items on a table and some text boxes with a description.

The surprising situation is, that a head up display and especially the textual labels have a use case from a scientific perspective. They are demonstrating the symbol grounding problem. The robot is able to think and fulfill tasks by using the information from the HUD. A command like "grasp the banana" is converted into an action like "grasp the object on top left which is a fruit and has a weight of 120g". These detail information are extracted from the hud display, because the banana item was recognized in the picture.

A head up display is a photo from the ego perspective of a robot with annotated objects. There is sometimes a status box available which shows additional information. These text boxes ensures, that the robot understands a situation. It makes the teleoperation of the robot smoother.

January 19, 2026

Sprachspiele vs. Mathematische puzzle

Über Jahrzehnte hinweg galten Mathematische Rätsel und Denkspiele wie SChach als Testumgebung für Künstliche Intelligenz. Anhand des SChachspiels kann man grundlegende Algorithmen wie Minmax erläutern und geometrische Routenfindungsprobleme sind ideal um darüber Path planning algorithmen zu verstehen.

Allerdings haben diese klassischen KI Probleme einen großen Nachteil: sie blende das Thema natürliche Sprache aus. In einem Spiel wie das 15 puzzle Problem oder im berühmten Tictactoe Spiel gibt es keine Worte sondern es nur ein grid-Raster auf dem die Spieler Steine platzieren. Als interne Datenstruktur für eine Computerumsetzung wird folgerichtig ein Array verwendet womit das Spiel simuliert wird. Damit reduzieren sich mögliche Ansätze zur Entwicklung von künstlicher Intelligenz auf eine mathematisch-logische Sichtweise.

Neben den erwähnten Logikpuzzles gibt es eine zweite Kategorie von Gesellschaftsspielen die in der Informatik erst seit 2010 näher untersucht werden: Sprachspiele. Diese haben ihre Wurzeln weniger in der Mathematik sondern werden traditionell zum Erlenen einer Fremdsprache verwendet. Sprachspiele stellen ein Rätsel da was man nur lösen kann wenn man die Vokabeln der Fremdsprache beherscht. Eine mögliche Aufgabe besteht darin für das englische Wort "apple" die deutsche Übersetzung "Apfel" zu finden oder ein gezeigte Bilderkarte mit dem korrekten Wort zu annotieren.

Obwohl diese Spiele in einem Computer simuliert werden können werden sie bisher nur selten in der KI Forschung eingesetzt. Dies hat etwas mit der traditionellen Ausrichtung auf mathematische Probleme zu tun, weil die Informatik sich historisch bedingt von der Mathematik abstammt also sich primär um Zahlen und Formeln kümmert und weniger um natürliche Sprache, welche innerhalb der Geisteswissenschaft erforscht wird.

Es gibt Indizieren dafür dass ausgrechnet natürliche Sprache der fehlende Baustein ist um Künstliche Intelligenz zu erzeugen. Sprache ist kulturell bedingt ein Zeichen und Sinnsystem was auf die realität referenziert. Frühe Hieroglyphenschriften verwenden bilder und symbole anstatt Substantive, während moderne Sprache auf das lateinische Alphabet zurückgreifen. Der Vorteil von natürlicher Sprache und speziell deren schriftlicher Ausprägung ist, dass es sich um ein etabliertes Zeichensystem handelt was in Wörterbüchern kodifiziert ist. Man muss also nicht eine neue Sprache erfinden die zur Steuerung von robotern verwendet wird sondern kann das bekannte Englische Vokabular für diesen Zweck verwenden. Die einzige Modifikation besteht darin, dass in der Robotik häufig auf einen Subset, eine mini language zurückgegriffen wird, der Roboter also effektiv nur 100 Worte versteht, ähnlich wie ein Textparser in einem Adenturegame.

Das besondere bei einem Sprachinterface ist, dass sich damit eine Fernsteuerung programmieren lässt die anders als eine joystick-Fernsteuerung über einen replay modus verfügt. Man speichert die kommandos die ein Mensch an einen Roboter erteilt in einem logfile und kann das logfile dann später erneut an den roboter senden und dieser wird auf dieselbe Weise reagieren.

Obwohl derartige Roboter bisher nur selten in der Forschungsliteratur erwähnt werden und das Themengebiet des Symbolgrounding hochkomplex ist, lassen sich praktische Roboter mit erstaunlich wenig Aufwand realisieren. Im Grunde reicht es einen einfachen Parser zu programmieren, der 10 worte versteht wie "left, right, up, down, stop, status" und schon kann man an diesen Roboter längere Befehlssequenzen senden. Zugegeben, das Prinzip ist nicht wirklich innovativ, es ist identisch mit der Entwicklung einer domain specific language und wurde in der Robotik unter der Bezeichnung "Karel the robot" seit den 1980er verwendet. Das neue liegt eher darin, dass man auf dieses Hilfsmittel der natürlichen Sprache stärker fokussiert um damit eine vielzahl von Künstliche Intelligenz Aufgaben zu formalisieren.

Ein derartiger Roboter wird nicht über Algorithmen gesteuert sondern die Basis ist ein Englisch-Wörterbuch mit Verben, adjektiven und Substantiven. DIe kenntnis dieser Worte erlaubt es dem roboter statusmeldungen an den BEnutzer zu senden und von diesem Befehle zu empfangen. SYmbol grounding ist folgerichtig die autoamtsierung der Mensch-Maschine-Schnittstelle. ES geht weniger darum, den roboter als Maschine zu programmieren sondern der fokus liegt auf der Schnittstelle zur Außenwelt, also welche Daten an den roboter gesendet werden und welche der Roboter an die Umwelt zurückschickt.

Künstliche Intelligenz früher

Vor dem Jahr 2010 war die KI Forschung durch Sackgassen, ungelöste Probleme und Pessimismus geprägt. Man könnte sagen, dass die Zeit von 1950 bis 2010 ein einziger langer KI Winter war, bei dem es nicht gelang wichtige Probleme wie Robotik zu lösen. Diese distanzierte Bewertung erhält man wenn man sich die gängige Literatur anschaut, also die Bücher und Konferenz-Proceedings die über Jahrzehnte von Wissenschaftlern publiziert wurden. Während klassische Informatik also software engineering, Datenbanken, Computer netze und Algorithmen einen kontiniurlichen Aufschwung erlebte und praktische Probleme lösen konnte, war die Künstliche Intelligenz niemals erfolgreich.

Am einfachsten lässt sich das Scheitern aufzeigen wenn man versucht konkrete Probleme zu lösen, z.B. eine Software zu programmieren die schach spielt, eine KI entwickelt die einen Roboter steuert oder einen Algorithmus entwirft der ein Selbstfahrendes Auto steuert. Zwar gab es viele Versuche sowas zu implementieren, aber im Regelfall war der ergebnis ernüchternd. Bevor eine Schachsoftware in der Lage ist gegen einen Menschen zu spielen wurde diese Software über Jahre hinweg verbessert, unzählige Manstunden sind in solche Projekte geflossen. Und bei Algorithmen zur Robotiksteuerung sieht es noch schlechter aus. Dort sind selbst sehr umfangreiche Programmbibliotheken wie das ROS System nicht im Stande einfachste roboter durch ein Labyrinth zu steuern.

Es gab bis 2010 durchaus Erfolge, z.B. hat der Deep blue Computer im Schach gegen einen Menschen gewonnen und das selbstfahrende Auto Stanley konnte im Jahr 2005 die Ziellinie der Darpa Grand challenge durchqueren. Allerdings war der Programmieraufwand für diese Prototypen extrem hoch und die Resultate sind nicht verallgemeinerbar.

Die gute Nachricht lautet dass sich das Unvermögen eine intelligente Software zu programmieren sich gut eingrenzen lässt. Es reicht aus die folgende Aufgabe näher zu untersuchen: "Programmieren sie eine Software die einen Roboter steuert. Der Roboter soll in einem Labyrinth vom Startpunkt zum Zielpunkt fahren und dabei Hindernissen ausweichen. Erlaubt ist als Programmiersprache jede vorhandene sprache, ebenfalls erlaubt ist jeder bekannte Algorithmus und jedes bekannte Informatik-Buch zu verwenden. Viel erfolg."

Die traurige Nachricht lautet, dass so eine Programmieraufgabe viel zu umfangreich ist. Informatiker vor dem Jahr 2010 sind damit immer gescheitert. Das Problem hat damit zu tun, dass vorhandene Programmiersprachen wie C/C++, vorhandene Algorithmen wie RRT und vorhandene Literatur wie "Russel/Norvig:AIMA" nicht ausreichend sind um das obige Problem mit dem Maze roboter zu lösen. Selbst wenn es gelänge in C/C++ einen Roboter zu programmieren der Hindernisse umfährt, wird der fertige Roboter nicht wirklich überzeugen können. Es wird ein Projekt mit einer hohen Zahl von Codezeilen werden während die praktische Einsatzfähigkeit nahe null ist.

Leider sind die erwähnten Tools, also Programmiersprachen, Algorithmen und Informatik-Bücher bereits das beste vom besten. C/C++ ist die effizienteste Programmiersprache die in Kombination mit dem ebenfalls sehr mächtige RRT Algorithmus eigentlich in der Lage sein sollte, einen Roboter zu steuern. Tut es aber nicht.

Der Grund warum es bis 2010 Künstliche Intelligenz ein unüberwindliches Hinderniss darstellte hat mit der Fokussierung auf Computer zu tun. Die unausgesprochene Vorstellung in der Informatik ist, eine Software zu erstellen die auf einer CPU ausgeführt wird, und diese Software soll dann ein Problem lösen. Informatiker haben Zugriff auf einen Computer auf dem ein Program mausgeführt werden kann und sie benötigen dann eine Software die das Problem löst. Diese Software wird in aller Regel neu geschrieben und hier liegt das Problem. Niemand weiß wie eine KI Software aussehen soll. Das STandardparadigma der Informatik funktioniert nicht.

Lediglich bei Nicht-KI Themen wie Datenbanken, Textverarbeitung oder mathematischen Problemen ist es sinnvoll computerzentriert vorzugehen. Wenn z.B. in einer Datenbank die Inhalte einer Tabelle aufaddiert werden sollen, reicht es dafür in C/C++ eine Software zu schreiben oder noch besser, man sucht nach einer bereits existierenden Bibliothek und verwendet diese lediglich für das konkrete Beispiel. Klassische Informatik funktioniert immer computer zentriert, d.h. es gibt eine vorhandene Hardware + Software die dann auf ein konkretes Problem angewendet wird. Lediglich bei KI Themen funktioniert dieses eingeübte Vorgehen nicht. Es gibt keine KI Standardbibliotheken und selbst das Neuschreiben von Software führt nicht zum erfolg.

Man kann die Zeit bis 2010 so umschreiben, dass damals die KI Community versucht hat Software zu entwickeln mit denen Computer intelligent werden, aber das alle Versuche dieser Art gescheitert sind. Selbst unkonventionelle Ansätze welche durch eine neue Generation von Informatikern verfolgt wurde war nicht erfolgreich. z.B. könnte ein alternatives Vorgehen darin bestehen auf die altmodische C/C++ Sprache zu verzichten zugunsten der neueren Python Sprache und den RRT Pfadplanungsalgorithmus zu ersetzen durch ein neuronales Netz. Man hätte damit zwar ein innovatives Robotik-Projekt allerdings würde auch dieses leider scheitern.

Damit dieser Blogpost nicht ausschließlich pessimistisch endet, zum Schluss noch ein kleiner Ausblick wie seit dem Jahr 2010 Robotikprojekte durchgeführt werden. Neuartige Robotik basiert darauf Sprachspiele zu maschinisieren die erforderlich sind um Probleme zu lösen. Ein Speaker übermittelt natürlichsprachige Anweisungen an einen Hearer, der ein Labyrinth durchquert. Die dabei ausgetauschte Kommunikation wird dann in ein Computermodel formalisiert. Damit kann der computer sowohl als Speaker als auch als Hearer agieren und so das Problem lösen.

Der Fokus verschiebt sich dabei von der früheren Programmierung eines Computers hin zum Entwurf eines Sprachspiels. Erst wenn das Sprachstil existiert wird dafür ein Computerprogram erstellt. Damit ist klar definiert was Künstliche Intelligenz ist, hähmlich die maschinelle Ausführung von Sprachspielen.

In der klassischen Informatik wird dies als verteilte KI bzw. agentenbasierte KI beschrieben. Allerdings gehen Sprachspiele weit darüber hinaus weil Sprachspiele auch zwischen Menschen stattfinden können. Im kern wird Sprache als Werkzeug eingesetzt um von der Realität zu abstrahieren.

Ein Sprachspiel ist ein vergleichbar mit klassischen Gesellschaftsspielen wie Mühle, TicTacToe oder Schach, allerdings mit dem Unterschied dass es nicht um logisches Denken geht sondern um das Beherschen einer Sprache. Meist werden Sprachspiele eingesetzt zum Fremdsprachen lernen, Beispiele sind das "Simon Says" spiel, das "Ich sehe was was du nicht siehst" und Übersetzungsspiele bei dem ein Bild in das korrekte Wort übersetzt werden muss.

Derartige Sprachspiele wurden von der Informatik über Jahrzehnte ignoriert. Man kann sie zwar ähnlich wie Schach oder wie ein Kartenspiel in einer Software simulieren und dafür sogar eine KI entwickeln, die das Sprachspiel automatisch löst, nur wurde dies von der Informatik meist nicht als lohnenswerte Aufgabe angesehen. zu unrecht, denn Sprachspiele sind die Schlüsseltechnologie bei der Entwicklungt künstlicher Intelligenz.

Moderne KI seit dem Jahr 2010 lässt sich erstaunlich simpel zusammenfassen. Es geht darum sich neuartige Sprachspiele zu überlegen, diese auf einem Computer zu simulieren und in Datasets und Neuronalen Netzen hochzuskalieren. Wenn man das realisiert, erhält man dabei automatisch eine hochentwickelte Künstliche Intelligenz die in der Lage ist als chatbot zu kommunizieren, die roboter steuern kann und die muster erkennt. Wichtig ist, dass das Sprachspiel oder der Trainingsdatensatz für neuronale Netze an irgendeiner Stelle eine Sprachkomponente beinhaltet. Es müssen also zwingend Verben und Substantive enthalten sein, weil es sonst kein Sprachspiel ist.

Es ist sinnvoll die Geschichte der KI grob in zwei Phasen einzuteilen: vor 2010 wurden mathematische Gesellschaftsspiele untersucht wie das 15 puzzle, Schach und Labyrinthspiele. Seit 2010 werden zusätzlich noch Sprachspiele untersucht z.b. "Visual question answering" "instruction following", "image annotation". Dieser veränderte Fokus erklärt die Erfolge der Künstliche Intelligenz seit dem Jahr 2010.

January 16, 2026

Taqging based evaluation functions

Despite its philosophical implication, Artificial Intelligence is also a technical subject within computer science. There are even some algorithms available to realize game playing agents in software. The most famous one is perpahs an evaluation fucntion which is used since the 1980s in computer chess. The inner working of evaluation function including a semantic improvement should be explained next.

In a classical heuristic evaluation function the current game state is mapped towards a score from 0.0 to 1.0 and this score helps the gradient descent solver to find the optimal action. In case of a path planning problem on a 2d grid map, the score is the manhattan distance to the goal, and in case of computer chess the is the strength of a player determined by the amount of chess pieces on the board.

For complex problem from robotics, the score can't be determined directly because its unclear if a certain game state is an improvement or not. Especially for biped walking based on mulitiple joints its impossible to determine a numerical score for a certain pose. What can be utilized instead is a tagging mechanism. Instead of calculating the numerical score, the computer determined only the detected tags. A tag is a semantic anchor like [finger_open] or [battery_low]. Such a tag doesn't describe the game state with numbers but with words. The numerical scores have to be calculated in a second step:

Game state -> semantic tags -> numerical score

The main advantage of a tagging mechanism is that its much easier to encode domain specific knowledge. There is no need to describe a game state from a mathamtical perspective, but its enough to provide a list of words to annotate a game state. This works great easpecially for complex motion capture annotation. An example list of detected tags for a bipod robot might be: [left_foot_front] [balance_stable] [right_foot_back] [servo_motor_off].

The objective for the computer program is to determine the correct tags in realtime. The robot is moving on the screen and in the background all the tags are shown on the screen.