February 06, 2026

Robot swarm builds a house

 

The picture shows multiple robots on a construction site who are controlled by a large language model over a longer time span on the same goal. The AI technology is based on natural language for reducing the state space drastically. The Large language model describes the project in English nouns and verbs and the single robots are converting the commands into physical action.

Database for npc quest generator

NPC quests aren't generated with algorithms but with a database. The database contains textual elements, in case of a warehouse robot a mini database looks like:

warehouse_db = {
    "items": {"SKU-01": "Lithium Batteries", "SKU-05": "Hydraulic Fluid"},
    "zones": {"Zone_A": "Cold Storage", "Zone_B": "Hazardous Materials"},
    "quest_types": ["Fetch", "Escort", "Cleanup", "Security Patrol"]
}

Such a database modulates the communication. An NPC quest like "fetch Lithium Batteries from Zone_B" is valid because the words are available in the database. Another possible quest taken from the former minidatabase is "security patrol Cold storage". Increating the intelligence of the robot doesn't mean to invent advanced algorithm but to populate the database with more entries. If the robot knows the names for important items and relevant locations in a warehouse he is an expert for the domain.

February 03, 2026

Annotating video games

The screenshot left shows a simple random walk in a path with two robots. Even if the picture is provided in maximum resolution it remains unclear what the meaning is of all these pixels. Human can guess that the connected nodes are the allowed path, but computers have no idea how to interpret the image.

The situation becomes much clearer by activated the pause mode shown on the right. There is an additional textual window which explains, that the red circle is robot1 who is moving from node #4 to #5 and has a full battery. These information can't be parsed from the original picture so the text box provides additional meaning. Another feature of a text box is, that computer will understand the information much easier because all the data are formatted in a key value syntax which is the prefered layout for machine understanding.

Such a text box is the core element of Artificial intelligence because it adresses the symbol grounding problem. The text box communicates the current game state to an external instance which is a human observer. Instead of analyzing how the simulation was programmed internally the new question is how to talk about the domain in natural language. Such a task is realized with a user interface in general and with a text box in detail. 

Simple example for a head up display


An entry level example for demonstrating the power of head up displays and grounded language is a route navigation problem which is perhaps the most easiest example for instruction following. The robot gets controlled with a random generator and after pausing the game, a text box with additional information on the screen. This text box contains of the grounded language which is important to provide meaning.

Every head up display is based on a two tier architecture: there is a graphical screen in the background and a textual screen in the foreground. Such kind of text boxes are common design element in videogames, and they are also useful for artificial intelligence. The compact representation in the text box helps a computer to understand a videogame.
Grounding means, that the AI is able to generate and format the content in the text box. 

The text box is updated if the video game status is changing. Both layers are synchronized automatically. Programming such an upto date grounded language is the core problem. In case of the graph traversal robot, the information shown in the text box are easy to format. In case of a kitchen robot or a self driving car the text box contains more complex information which are harder to maintain automatically.

January 29, 2026

Pong AI with internal teacher

 

The game has 2 modes: a) normal videogame instructions are executed by the AI b) internal teacher, game is paused and text overlay is shown. A press on space toggles between the modes. This two mode system emulates a speaker to hearer interaction. So there isn't a single Game AI available which controls the paddle, but there are 2 layers which have different obligations. 

Künstliche Intelligenz durch offene Systeme

 In der Geschichte der Informatik waren offene vs. geschlossene System nie ein Thema. DIese Kategoriesierung wurde niemals genutzt um über Roboter zu philosophieren, stattdessen wurde der Diskurs anhand des Algorithmusbegriff geführt. So unterscheidet die Informatik zwischen sampling-Algorithmen, heuristischen Algorithmen und backtracking algorithmen.

Die Beschreibung von offenen Systeme ist jedoch um einiges mächtiger als der eher mathematisch orientierte Algorithmusbegriff. Ein offenes System ist eine Form der Orakel-Turing-Maschine wo also neben dem Computerprogram noch ein zweite höhere Instanz existiert die man bei problemen konsultiert. Als Folge entsteht bei offenen Systemen eine Interaktion zwischen dem Hearer der eine Aufgabe lösen soll und dem Speaker, welcher Anweisungen gibt. Diese interaktion lässt sich innerhalb eines Computersystems modellieren ähnlich wie ein netzwerk-Protokoll-Stack und dient dazu Probleme im Bereich der Künstlichen Intelligenz zu lösen.

Die konkrete Implementierung lässt sich am Beispiel eines Lagerroboters erläutern. Dieser navigiert in einer grid map von 100x100 Zellen und muss dort pick&place aufgaben ausführen. Der Lagerroboter ist der Hearer während der Speaker mehrere NPC Quests formluiert wie "fahre in Raum B und nehme dort die Palatte #2 auf." oder "bringe die Palette in Raum A und danach fahre zur Ladestation".

Das besondere an offenen Systemen ist, dass der Anweisungen von der Ausführung getrennt sind. Der Speaker formuliert nur die NPC quests, er denkt sich also Aufgaben aus die Sinn machen. Der Speaker kann jedoch nicht selber diese Aufgaben lösen sondern sendet diese an den Hearer. Dieser Hearer-Roboter übersetzt eine Aufgabe in Servomotor-Aktivitäten und fährt zu einem bestimmten Raum in dem Lagerraum oder lädt die Batterie an der Ladestation.

Programmiertechnisch ergeben sich dadurch mehrere Herausforderungen: a) welche Art von NPC quests machen sinn im Kontext eines Lagerroboters b) wie kann eine konkreter NPC Quests ausgeführt werden? Diese Probleme sind in der Art relativ anspruchsvoll und müssen mit klassischen Algorithmen der Informatik gelöst werden. So könnte eine aufgabe wie "fahre zu Raum B" mit Hilfe eines path planning algorithmus gelöst werden, wie A star.

Die Kommunikation in offenen Systemen ist dabei kein Selbstzweck sondern erst durch die KOmmunikation kann man eine Aufgabe maschinenlesbar gestalten. Indem der Speaker an den Hearer eine Textbotschaft sendet, entsteht eine Interaktipon welche sich in Logfiles speichern lässt. Aus den Logfiles kann man ablesen, ob eine Aufgabe gelöst wurde und wenn nicht warum nicht. Darüber lässt sich dann die Künstliche Intelligenz verbessern. Vielleicht ein Beispiel: Angenommen ein Roboter bleibt vor einem Regal stehen ohne erkennbaren Grund. Wäre das System ein geschlossenes System ohne Kommunikation mit einer höheren Instanz bleibt unklar woran es liegt. Vielleicht ist es ein Hardware-Problem, vielleicht ist die Software abgestürzt, oder vielleicht ist die KI defekt.

Wenn es hingegen ein offenes System ist, braucht man nur die letzten Nachrichten auswerten die vom speaker and den Hearer versendet wurden und schon lässt sich der Fehler eingrenzen. Wenn z.B. die letzte Nachricht war "Speaker: halte an". DAnn ist das Stehenbleiben des Roboters kein Fehler weil er den Befehl des Speakers umgesetzt hat. Die Frage ist dann lediglich warum der Speaker dieses Komando gesendet hat.

January 26, 2026

Improved chatbot for a kitchen robot

In addition to the previous post, the python script was improved a bit. There are more entries in the database, the amount of informaiton is higher, and very important a telemetry mapping function is available. This allows to monitor a teleoperation robot. The amount of codelines was increased to 80 but the software remains easy to understand.

The core element is a database with words. Every word is described with additional key-value informaiton for example a picture or a position. The AI takes the current sensory data and searches for a match in the database and the AI also searches for a text input from a user. If the AI has found an entry in the database its equal to understand a situation. In short, the AI is a database lookup algorithm. Here is an example interaction and of course the source code written in Python3.

----
gathering telemetry ...
attention near apple
robotpos near table
user: lookat table
search database ...
lookat action inspect object
table {'pos': (0, 0), 'desc': 'place for storing objects', 'word': 'noun'}

gathering telemetry ...
attention near apple
robotpos near table
user: grasp apple
search database ...
grasp action take an object
apple {'pos': (10, 3), 'word': 'noun', 'category': 'fruit', 'desc': 'is food to eat', 'filename': 'apple.jpg'}
----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
application: Teleoperation monitoring
"""
class Chatbot:
  def __init__(self):
    self.data={
      # verb
      "open": "action open something",
      "grasp": "action take an object",
      "ungrasp": "action place object from hand to world",
      "eat": "action eat food",
      "lookat": "action inspect object",
      "walkto": { 
        "word": "verb",
        "category": "action",
        "desc": "move towards location",
        "motor": "legs",
      },
      # noun
      "apple": {
        "pos": (10,3),
        "word": "noun",
        "category": "fruit",
        "desc": "is food to eat",
        "filename": "apple.jpg",
      },
      "banana": {
        "desc": "noun food",
      },
      "table": { 
        "pos": (0,0),
        "desc": "place for storing objects",
        "word": "noun",
      },
      "fridge": {
        "pos": (1,0),
        "word": "noun",
        "status": "closed",
        "category": "furniture",
      },
      "plate": "noun food is served there",
      "door": "noun entrance to room",
    }
    self.telemetry()
    self.parser()
  def getdist(self,p1,p2): # return: manhattan_distance
    result=abs(p1[0]-p2[0])+abs(p1[1]-p2[1])
    return result
  def telemetry(self):
    self.sensor={
      "robotpos": (0,1),
      "camera": "cam02.jpg",
      "attention": (10,3),
    }
    # search robotpos and attention
    print("gathering telemetry ...")
    for i in self.data:
      if "pos" in self.data[i]:
        dist=self.getdist(self.sensor["robotpos"],self.data[i]["pos"])
        if dist<=1:
          print("robotpos near",i)
        dist=self.getdist(self.sensor["attention"],self.data[i]["pos"])
        if dist<=1:
          print("attention near",i)
  def parser(self):
    line=input("user: ") # manuel input
    line=line.split()
    print("search database ...")
    for i in line:
      if i in self.data:
        print(i,self.data[i])
      else:
        print(i,"not found")
    

if __name__ == '__main__':
  c=Chatbot()

January 25, 2026

Chatbot for a kitchen robot

 Artificial Intelligence is not the result of a sophisticated computer algorithm, but it is produced by human to machine interaction. An easy to implement example is a chatbot which is demonstrated for the domain of a kitchen robot. The software was written in 35 lines of code in python and consists of a mini database plus a parser. A command send to the chatbot is searched in the database and found matches are shown on the screen.

From a technical perspective, the chatbot is trivial. It doesn't contain of complex routines and even beginner in the Python language will understand how the program works. The more advanced subject is to explain why such a program is required for a robot control system. The idea is that the AI is stored in the database which is a vocabulary list. For each word e,g. the noun "apple" additional information are provided. In the chatbot the words are linked to a simple definition, but the database can be enhanced with a longer description and even a filename to a picture which shows a photograph of the noun. Such kind of database allows a computer to understand the meaning of a sentence. A sentence like "eat apple" is converted by the parser into a full blown description of the workflow. The software knows that the sentence is referencing to an action word and an object and it knows that it has to with eating food.

-----
user: open door
parsing ...
open action open something
door noun entrance to room

user: eat apple
parsing ...
eat action eat food
apple noun food
-----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
"""
class Chatbot:
  def __init__(self):
    self.data={
      # verb
      "open": "action open something",
      "grasp": "action take an object",
      "ungrasp": "action place object from hand to world",
      "eat": "action eat food",
      "walkto": "action move towards location",
      # noun
      "apple": "noun food",
      "banana": "noun food",
      "table": "noun place for storing objects",
      "plate": "noun food is served there",
      "door": "noun entrance to room",
    }
    self.parser()
  def parser(self):
    line=input("user: ") # manuel input
    line=line.split()
    print("parsing ...")
    for i in line:
      if i in self.data:
        print(i,self.data[i])
      else:
        print(i,"not found")
    

if __name__ == '__main__':
  c=Chatbot()