February 11, 2026

Playing pong videogame with a perception buffer

 

In addition to the previous post here is another example with a working AI based on a buffer. The game has two modes: normal mode which runs the simulation and a pause mode which shows AI information including the perception buffer, the action buffer and the predicted trajectory. The text box shows the known information from the game engine which are the ball position, its velocity and other information. the action buffer stores information what to do next. This information is submitted to the paddle.

Because of the simplicity of the pong videogame, the AI master the challenge with ease, it will move the paddle towards the correct position.

From a technical perspective the buffer was realized with a python dict for storing the information in a key/value syntax. Creating such a dictionary and showing the content on the screen is very simple. The innovation has to do with the assumption that such a buffer modulates the communication process. The AI brain isn't imagined as a sophisticated algorithm, but its a database which holds information as natural language. This will generate multiple subtasks like a) how to convert the game state into a perception buffer b) how to translate the perception buffer into the next action and c) how to submit the content of the action buffer back to the game engine. 

Computer programming vs. AI programming

 Computer programming is the art of software creation. It has to do converting a real world problem into executable program code like Java or C/C++- A typical example is to program a pong videogame, or improve a database management system.

Modern computer programming since the 2010s does't reinvents the wheel but its using existing operating systems, programming languages and libraries. For example videogames are written with the help of a 2d game library, and database systems are created on top of existing SQL databases.

Programming has always the goal of creating software and modify existing software which is running on a computer. All the modern technology like the Internet, word processing software, and database software is the result of well engineered software applications.

Despite the importance of programming in computer science the discipline has a blind spot because its not possible to program an AI software or write a software for a robot. Many attempts in writting robot software in C/C++ and Java were presented in the past, but most of them have to be called a failure. It seems, that artificial intelligence is working different from classical software engineering principle. Its not possible to reuse existing software libraries or take advantage of existing programming languages. Even the most powerful programming language avaialble which is Python in combination with the latest mathematical libraries is useless for realizing a robot project. The reason is, that software programming describes the world as computer centric. The attention is always directed toward a computer and towards its ability to execute a software. For example the Python interpreter provides a list of commands. Programming means to arrange these commands to a fixed structure which is a computer program, namely in classes in subroutines. Then the program canb e exucuted. The problem is that such a program won't realize artificial intelligence.

There is a single programming excersise available which demonstrates the transition from classical software programming towards artificial intelligence which is activitity recognition in motion capture. This specialized problem has its roots in computer animation and was first mentioned in the 1970s. The task is to annoate the movements of the mocap markers with textual names like sitting, jumping, walking and so forth.

Computer programming is focussed on the CPU of a computer. The computer has to solve a problem, e.g. adding two numbers or search in a database with a search algorithms. In contrast, the activity recogntion task works with a communiation paradigm similar to an internet protocol. The idea is to convert low level data into high level data. Such a communication system is an open system, which is seldom described in the programming literature. The reason is that communication is referenced to external parties located outside of a computer. 

Classical programming works with the algorithm paradigm as a theoretical understanding. The algorithm is executed on the machine and solves a problem. In contrast, communciation oriented programming works with the sender to receiver paradigm. There is no algorithms needed but there is a message which is delivered over the network. Programming a robot is similar to implementing a communication protocol, there is also a sender, a receiver, a message and a protocol. And the robot never runs an algorithm, but the robot receives a message.

February 09, 2026

Roboter steuerung mit der DIKW Pyramide

Obwohl die DIKW Pyramide in der Literatur häufig diskutiert wird, ist ihre Anwendungsmöglichkeit innerhalb der Robotik nur selten dokumentiert. Als motivierenden Einstieg hier ein Beispiel für einen Warehouse roboter. Auf der untersten Ebene (Daten) fallen folgende Messwerte an:

- Geokoordinaten: "X: 194.5 / Y: 10.2"
- Prozentwert: "12%"
- Barcode scan: "ID: 00056789"
- Temperatur eines Servomotors: "42°C"
- Geschwindigkeit: "0.2 m/s"
- Sensor Schaltzustand: "Bit 1 = On" 
- Zeitstempel: "12.05.2025 / 10:02:01"

Diese Daten sind Rohdaten wie sie von Sensoren ermittelt werden, also über gps triangulation, barcode reader oder von einem Temperatursensor.  Eine tiefergehende Bedeutung haben diese Daten nicht, sondern sie werden nur mitgeloggt und in einer Datenbank als numerische Werte gespeichert.

Zur Steuerung des Roboters interessanter ist die nächste höhere Ebene der DIKW Python: Information.

- Batterieladezustand ist gering, bezug zu 12%
- Standort des Roboters ist Regel 8, Fach A. Bezug zu Geokoordinaten
- Paket mit 00056789 ist eine Palette mit Glasflaschen, Bezug zu Barcode scan
- Motorüberhitzung droht, Bezug zu 42°C

Die Zuordnung von Daten zu Information erfolgt mit Hilfe von weiteren Datenbankeinträgen. Darin sind Textinträge gespeichert wie "Motorüberhitzung droht" und Bedingungen wann diese zutreffen. Die Informationsebene ist nicht als numerische Daten gespeichert sondern besteht aus kurzen Sätzen in natürlicher Sprache. Höhere Ebene in der DIKW Pyramide beinhalten abstraktere Formulierungen die Expertenwissen beinhalten und für die Aufgabe des Roboters wichtig sind.

Technisch gesehen ist eine DIKW Pyramide ein DAtenbank-MAnagement system, worin die Daten/Informationen auf unterschiedlichen Tabellen verteilt sind und über Regeln zusammengefügt werden. Der Inhalt der Datenbank wird in Echtzeit aktualisiert. Auf der höchsten Ebene (Wisdom) ist die Steuerung des Roboter dann sehr simpel. Man sendet einen natürlich-sprachlichen Befehl wie "Fahre zum Regel C und hole die Glasflaschen und bringe sie zu Regel B". Dieses High level Kommando wird dann übersetzt in konkreten Befehle an den Roboter.

February 07, 2026

Robot control with a DIKW pyramid

Symbol grounding is about moving down and moving up along a dikw pyramid. This allows to hide the details and expand the details of a subject. For the example of a warehouse robot the dikw pyramid can be implemented as a python dictionary which shows only the upper layer and the bottom layer:

dikw_pyramid={
  "wisdom": {
    "Go to the loading bay and clear the blockage.",
  },
  "data": { 
    "lidar_dist": 0.5, "weight_kg": 25.0, "coords": (12.4, 45.8)
  },
}


The raw sensor data are feed into the data layer and are formmated as numerical values. In contrast the wisdom layer of the pyramid stores the voice commands formulated in English sentences. The task for the symbol grounding engine is to translate between these layers. This is realized by instruction following (from top to bottom) and activitity recognition (from bottom to top).

February 06, 2026

Robot swarm builds a house

 

The picture shows multiple robots on a construction site who are controlled by a large language model over a longer time span on the same goal. The AI technology is based on natural language for reducing the state space drastically. The Large language model describes the project in English nouns and verbs and the single robots are converting the commands into physical action.

Database for npc quest generator

NPC quests aren't generated with algorithms but with a database. The database contains textual elements, in case of a warehouse robot a mini database looks like:

warehouse_db = {
    "items": {"SKU-01": "Lithium Batteries", "SKU-05": "Hydraulic Fluid"},
    "zones": {"Zone_A": "Cold Storage", "Zone_B": "Hazardous Materials"},
    "quest_types": ["Fetch", "Escort", "Cleanup", "Security Patrol"]
}

Such a database modulates the communication. An NPC quest like "fetch Lithium Batteries from Zone_B" is valid because the words are available in the database. Another possible quest taken from the former minidatabase is "security patrol Cold storage". Increating the intelligence of the robot doesn't mean to invent advanced algorithm but to populate the database with more entries. If the robot knows the names for important items and relevant locations in a warehouse he is an expert for the domain.

February 03, 2026

Annotating video games

The screenshot left shows a simple random walk in a path with two robots. Even if the picture is provided in maximum resolution it remains unclear what the meaning is of all these pixels. Human can guess that the connected nodes are the allowed path, but computers have no idea how to interpret the image.

The situation becomes much clearer by activated the pause mode shown on the right. There is an additional textual window which explains, that the red circle is robot1 who is moving from node #4 to #5 and has a full battery. These information can't be parsed from the original picture so the text box provides additional meaning. Another feature of a text box is, that computer will understand the information much easier because all the data are formatted in a key value syntax which is the prefered layout for machine understanding.

Such a text box is the core element of Artificial intelligence because it adresses the symbol grounding problem. The text box communicates the current game state to an external instance which is a human observer. Instead of analyzing how the simulation was programmed internally the new question is how to talk about the domain in natural language. Such a task is realized with a user interface in general and with a text box in detail. 

Simple example for a head up display


An entry level example for demonstrating the power of head up displays and grounded language is a route navigation problem which is perhaps the most easiest example for instruction following. The robot gets controlled with a random generator and after pausing the game, a text box with additional information on the screen. This text box contains of the grounded language which is important to provide meaning.

Every head up display is based on a two tier architecture: there is a graphical screen in the background and a textual screen in the foreground. Such kind of text boxes are common design element in videogames, and they are also useful for artificial intelligence. The compact representation in the text box helps a computer to understand a videogame.
Grounding means, that the AI is able to generate and format the content in the text box. 

The text box is updated if the video game status is changing. Both layers are synchronized automatically. Programming such an upto date grounded language is the core problem. In case of the graph traversal robot, the information shown in the text box are easy to format. In case of a kitchen robot or a self driving car the text box contains more complex information which are harder to maintain automatically.