February 03, 2026

Simple example for a head up display


An entry level example for demonstrating the power of head up displays and grounded language is a route navigation problem which is perhaps the most easiest example for instruction following. The robot gets controlled with a random generator and after pausing the game, a text box with additional information on the screen. This text box contains of the grounded language which is important to provide meaning.

Every head up display is based on a two tier architecture: there is a graphical screen in the background and a textual screen in the foreground. Such kind of text boxes are common design element in videogames, and they are also useful for artificial intelligence. The compact representation in the text box helps a computer to understand a videogame.
Grounding means, that the AI is able to generate and format the content in the text box. 

The text box is updated if the video game status is changing. Both layers are synchronized automatically. Programming such an upto date grounded language is the core problem. In case of the graph traversal robot, the information shown in the text box are easy to format. In case of a kitchen robot or a self driving car the text box contains more complex information which are harder to maintain automatically.

January 29, 2026

Pong AI with internal teacher

 

The game has 2 modes: a) normal videogame instructions are executed by the AI b) internal teacher, game is paused and text overlay is shown. A press on space toggles between the modes. This two mode system emulates a speaker to hearer interaction. So there isn't a single Game AI available which controls the paddle, but there are 2 layers which have different obligations. 

Künstliche Intelligenz durch offene Systeme

 In der Geschichte der Informatik waren offene vs. geschlossene System nie ein Thema. DIese Kategoriesierung wurde niemals genutzt um über Roboter zu philosophieren, stattdessen wurde der Diskurs anhand des Algorithmusbegriff geführt. So unterscheidet die Informatik zwischen sampling-Algorithmen, heuristischen Algorithmen und backtracking algorithmen.

Die Beschreibung von offenen Systeme ist jedoch um einiges mächtiger als der eher mathematisch orientierte Algorithmusbegriff. Ein offenes System ist eine Form der Orakel-Turing-Maschine wo also neben dem Computerprogram noch ein zweite höhere Instanz existiert die man bei problemen konsultiert. Als Folge entsteht bei offenen Systemen eine Interaktion zwischen dem Hearer der eine Aufgabe lösen soll und dem Speaker, welcher Anweisungen gibt. Diese interaktion lässt sich innerhalb eines Computersystems modellieren ähnlich wie ein netzwerk-Protokoll-Stack und dient dazu Probleme im Bereich der Künstlichen Intelligenz zu lösen.

Die konkrete Implementierung lässt sich am Beispiel eines Lagerroboters erläutern. Dieser navigiert in einer grid map von 100x100 Zellen und muss dort pick&place aufgaben ausführen. Der Lagerroboter ist der Hearer während der Speaker mehrere NPC Quests formluiert wie "fahre in Raum B und nehme dort die Palatte #2 auf." oder "bringe die Palette in Raum A und danach fahre zur Ladestation".

Das besondere an offenen Systemen ist, dass der Anweisungen von der Ausführung getrennt sind. Der Speaker formuliert nur die NPC quests, er denkt sich also Aufgaben aus die Sinn machen. Der Speaker kann jedoch nicht selber diese Aufgaben lösen sondern sendet diese an den Hearer. Dieser Hearer-Roboter übersetzt eine Aufgabe in Servomotor-Aktivitäten und fährt zu einem bestimmten Raum in dem Lagerraum oder lädt die Batterie an der Ladestation.

Programmiertechnisch ergeben sich dadurch mehrere Herausforderungen: a) welche Art von NPC quests machen sinn im Kontext eines Lagerroboters b) wie kann eine konkreter NPC Quests ausgeführt werden? Diese Probleme sind in der Art relativ anspruchsvoll und müssen mit klassischen Algorithmen der Informatik gelöst werden. So könnte eine aufgabe wie "fahre zu Raum B" mit Hilfe eines path planning algorithmus gelöst werden, wie A star.

Die Kommunikation in offenen Systemen ist dabei kein Selbstzweck sondern erst durch die KOmmunikation kann man eine Aufgabe maschinenlesbar gestalten. Indem der Speaker an den Hearer eine Textbotschaft sendet, entsteht eine Interaktipon welche sich in Logfiles speichern lässt. Aus den Logfiles kann man ablesen, ob eine Aufgabe gelöst wurde und wenn nicht warum nicht. Darüber lässt sich dann die Künstliche Intelligenz verbessern. Vielleicht ein Beispiel: Angenommen ein Roboter bleibt vor einem Regal stehen ohne erkennbaren Grund. Wäre das System ein geschlossenes System ohne Kommunikation mit einer höheren Instanz bleibt unklar woran es liegt. Vielleicht ist es ein Hardware-Problem, vielleicht ist die Software abgestürzt, oder vielleicht ist die KI defekt.

Wenn es hingegen ein offenes System ist, braucht man nur die letzten Nachrichten auswerten die vom speaker and den Hearer versendet wurden und schon lässt sich der Fehler eingrenzen. Wenn z.B. die letzte Nachricht war "Speaker: halte an". DAnn ist das Stehenbleiben des Roboters kein Fehler weil er den Befehl des Speakers umgesetzt hat. Die Frage ist dann lediglich warum der Speaker dieses Komando gesendet hat.

January 26, 2026

Improved chatbot for a kitchen robot

In addition to the previous post, the python script was improved a bit. There are more entries in the database, the amount of informaiton is higher, and very important a telemetry mapping function is available. This allows to monitor a teleoperation robot. The amount of codelines was increased to 80 but the software remains easy to understand.

The core element is a database with words. Every word is described with additional key-value informaiton for example a picture or a position. The AI takes the current sensory data and searches for a match in the database and the AI also searches for a text input from a user. If the AI has found an entry in the database its equal to understand a situation. In short, the AI is a database lookup algorithm. Here is an example interaction and of course the source code written in Python3.

----
gathering telemetry ...
attention near apple
robotpos near table
user: lookat table
search database ...
lookat action inspect object
table {'pos': (0, 0), 'desc': 'place for storing objects', 'word': 'noun'}

gathering telemetry ...
attention near apple
robotpos near table
user: grasp apple
search database ...
grasp action take an object
apple {'pos': (10, 3), 'word': 'noun', 'category': 'fruit', 'desc': 'is food to eat', 'filename': 'apple.jpg'}
----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
application: Teleoperation monitoring
"""
class Chatbot:
  def __init__(self):
    self.data={
      # verb
      "open": "action open something",
      "grasp": "action take an object",
      "ungrasp": "action place object from hand to world",
      "eat": "action eat food",
      "lookat": "action inspect object",
      "walkto": { 
        "word": "verb",
        "category": "action",
        "desc": "move towards location",
        "motor": "legs",
      },
      # noun
      "apple": {
        "pos": (10,3),
        "word": "noun",
        "category": "fruit",
        "desc": "is food to eat",
        "filename": "apple.jpg",
      },
      "banana": {
        "desc": "noun food",
      },
      "table": { 
        "pos": (0,0),
        "desc": "place for storing objects",
        "word": "noun",
      },
      "fridge": {
        "pos": (1,0),
        "word": "noun",
        "status": "closed",
        "category": "furniture",
      },
      "plate": "noun food is served there",
      "door": "noun entrance to room",
    }
    self.telemetry()
    self.parser()
  def getdist(self,p1,p2): # return: manhattan_distance
    result=abs(p1[0]-p2[0])+abs(p1[1]-p2[1])
    return result
  def telemetry(self):
    self.sensor={
      "robotpos": (0,1),
      "camera": "cam02.jpg",
      "attention": (10,3),
    }
    # search robotpos and attention
    print("gathering telemetry ...")
    for i in self.data:
      if "pos" in self.data[i]:
        dist=self.getdist(self.sensor["robotpos"],self.data[i]["pos"])
        if dist<=1:
          print("robotpos near",i)
        dist=self.getdist(self.sensor["attention"],self.data[i]["pos"])
        if dist<=1:
          print("attention near",i)
  def parser(self):
    line=input("user: ") # manuel input
    line=line.split()
    print("search database ...")
    for i in line:
      if i in self.data:
        print(i,self.data[i])
      else:
        print(i,"not found")
    

if __name__ == '__main__':
  c=Chatbot()

January 25, 2026

Chatbot for a kitchen robot

 Artificial Intelligence is not the result of a sophisticated computer algorithm, but it is produced by human to machine interaction. An easy to implement example is a chatbot which is demonstrated for the domain of a kitchen robot. The software was written in 35 lines of code in python and consists of a mini database plus a parser. A command send to the chatbot is searched in the database and found matches are shown on the screen.

From a technical perspective, the chatbot is trivial. It doesn't contain of complex routines and even beginner in the Python language will understand how the program works. The more advanced subject is to explain why such a program is required for a robot control system. The idea is that the AI is stored in the database which is a vocabulary list. For each word e,g. the noun "apple" additional information are provided. In the chatbot the words are linked to a simple definition, but the database can be enhanced with a longer description and even a filename to a picture which shows a photograph of the noun. Such kind of database allows a computer to understand the meaning of a sentence. A sentence like "eat apple" is converted by the parser into a full blown description of the workflow. The software knows that the sentence is referencing to an action word and an object and it knows that it has to with eating food.

-----
user: open door
parsing ...
open action open something
door noun entrance to room

user: eat apple
parsing ...
eat action eat food
apple noun food
-----

"""
chatbot kitchen robot
a wordlist is stored as python dictionary, user enters command which is searched in the wordlist
"""
class Chatbot:
  def __init__(self):
    self.data={
      # verb
      "open": "action open something",
      "grasp": "action take an object",
      "ungrasp": "action place object from hand to world",
      "eat": "action eat food",
      "walkto": "action move towards location",
      # noun
      "apple": "noun food",
      "banana": "noun food",
      "table": "noun place for storing objects",
      "plate": "noun food is served there",
      "door": "noun entrance to room",
    }
    self.parser()
  def parser(self):
    line=input("user: ") # manuel input
    line=line.split()
    print("parsing ...")
    for i in line:
      if i in self.data:
        print(i,self.data[i])
      else:
        print(i,"not found")
    

if __name__ == '__main__':
  c=Chatbot()


January 24, 2026

Programming a symbol grounding engine


The core element is a database which consists of flashcards. Entries in the database are natural language words for [apple, banana, plate, table, ...] and also for verbs like [grasp, walkto, use, ungrasp, ...]. The symbol grounding engine works like a parser for a text adventure: a certain input on the terminal like "grasp apple" is matched with the database. The found entries in the database are extracted and converted into action signals for the robot hand.

In other words, there is no AI algorithm needed, but there is a word database. The database ensures that the computer understands natural language commands like "walkto table, locate apple, grasp banana".

January 23, 2026

Creating an internal teacher with natural language

 Natural language is a powerful tool for humans to describe the reality. THe existing vocabulary can be reused for robotcs applications. The only bottleneck is, that computers can't parse natural language directly but need a parser and an ontology for doing so. The following blog post explains the idea for a kitchen robot.

The starting point is an ontology which is realized as a python dictionary. The ontology stores items in a kitchen and possible actions for these items.

--------------
items:
  apple, food
  banana, food
  plate, dishes
  pot, dishes
action
  prepate_meal, search(food)+eat(food)
  cleanup_kitchen, search(dishes)+search(trash)
--------------

The system is realized as a command line prompt which asks the human operator to enter commands. A possible session is shown next:
$ apple
> apple is food, location is (10,1)
$ milk
> not_found
$ banana
> banana is a food, location is (20,4)
$ prepare_meal
> search(food) -> found: apple,banana
> eat(food) -> eat(apple), eat(banana)

The parser takes the human input and searches in the ontology for a definition, for location and for actions. Its some sort of text adventure game which works also with a parser and a database. If the user enters a command or object name which is not available in the database, the parser will answer the request with an error message. In other words, the intelligence of the AI doesn't depend on the parser itself but its the result of well populated database. Somebody has enter all the kitchen items into the database to make the system highly responsive.
  
  

January 22, 2026

Head up display with grounded language

In science fiction movies of the 1980s, a head up display (HUD) is shown. This allows the audience to see what the robot might see. It was assumed, that such a head up display doesn't has a purpose but its only a special effect. An example for such a head up display is shown on top of this blog post. There are multiple food items on a table and some text boxes with a description.

The surprising situation is, that a head up display and especially the textual labels have a use case from a scientific perspective. They are demonstrating the symbol grounding problem. The robot is able to think and fulfill tasks by using the information from the HUD. A command like "grasp the banana" is converted into an action like "grasp the object on top left which is a fruit and has a weight of 120g". These detail information are extracted from the hud display, because the banana item was recognized in the picture.

A head up display is a photo from the ego perspective of a robot with annotated objects. There is sometimes a status box available which shows additional information. These text boxes ensures, that the robot understands a situation. It makes the teleoperation of the robot smoother.