Robotics and Artificial Intelligence: February 2026

February 28, 2026

Erstellen einer wissenschaftlichen Hausarbeit mit Hilfe von Large Language Modelle zum Thema Halle 54 und die Automatisierung in den 1980er Jahren

__Einleitung__

Von Large Language modellen wie chatgpt und Google Gemini ist bekannt dass sie kleinere Recherchen unterstützen können und technisch in der Lage sind, die Rechtschreibkorrektur einer wissenschaftlichen Ausarbeitung zu übernehmen. Unklar war hingegen, ob Large Language modelle auch eine komplette Hausarbeit verfassen können. Eine solche Aufgabe erfordert üblicherweise einen menschlichen Aufwand von 1 Monat und länger und liegt damit außerhalb der Leistungsfähigkeit heutiger KI Systeme. Dies behauptet zumindest der https://metr.org/ benchmark. Danach können die derzeit leistungsfähigen neuronale Netze Programmier Aufgaben ausführen für die Menschen rund 10 Stunden benötigen, z.B. das Implementierungen eines Netzwerkprotokolls.

Will man längere komplexe Tasks mit Hilfe von LLMs bearbeiten benötigt man eine spezielle Reward funktion, ein Multiagentensystem oder ähnliche Hilfsmittel weil sonst die Gefahr besteht, dass die KI sich in einer endlos Schleife verfängt, Also bereits erstellten Quellcode oder vorhandene Texte erneut editiert ohne dass ein erkennbarer Fortschritt sichtbar wird.

Im folgenden Fall wurde ein anderes Konzept verwendet, was als Luhmann Zettelkastenmethode bekannt ist. Diese Methode wird in den Geisteswissenschaften verwendet um eine Hausarbeit zu ordnen und hilft ebenfalls dabei die Interaktion mit einem Large language modell zu strukturieren.

Als Thema der Hausarbeit wurde gewählt "Halle 54 Automatisierung in den 1980er Jahren" weil es gut eingrenzbar ist und mit etwas Literaturrecherche leicht in einen wissenschaftlichen Text überführt werden kann. Zuerst benötigt man einen Prompt um das Problem für ein LLM zu schildern:

__Prompt__

titel: Die Halle 54 bei VW als gescheitertes Automatisierungsprojekt in den 1980er Jahren

Aufgabe: Erstelle 8 Luhmann Karteikarten zum Titel. Jede Karteikarte enthält eine Luhmann ID, einen Titel, und Stichpunktartige Notizen welche ruhig chaotisch sein können. Stelle sicher dass weitere künftige Karteikarten angefügt werden können. Ausgabesprache ist Deutsch.

Inhalt: Ungefähr im Jahr 1983 gab es beim Autohersteller VW ein Robotik Automatisierungsprojekt in der Halle 54. Damals wurden computergesteuerte Roboter eingesetzt um das Ziel der Vollautoamtisierung der Fahrzeugproduktion umzusetzen. Später stellte sich heraus, dass der angestrebte hohe Automatisierungsgrad technisch nicht machbar ist. Die damalige Hard- und Software blieb hinter den hohen Erwartungen zurück.
-----
Beides, erzeuge ingesamt 8 weitere Karteikarten.
Ja, und erzeuge weitere Karten zur verwendeten Software beim Halle 54 Projekt (wenn es dazu Informationen gibt)
Erstelle für die bisherigen Karteikarten einen Strukturzettel als Gliederung für eine wissenschaftliche Hausarbeit.
Nein, beginne stattdessen mit dem Schreiben des Volltextes für das Kapitel "1. Einleitung: Der Traum von der menschenleeren Fabrik" auf basis der vorhandenen Karteikarten. Der Volltext sollte rund 800 Worte enthalten.
-----

Wie im prompt gefordert erzeugte die KI zuerst einmal Karteikarten und zwar 24 stück. Anschließend wurde ein Strukturzettel erstellt, also eine Karteikarte die auf andere Karteikarten verweist. Diese Karteikarten wurden dann in einen Fließtext überführt der hier vollständig abgedruckt ist.

Im Fließtext verstreut finden sich Referenzen zu den Luhmann Karteikarten, z.B. "(ID 3.5)". Der Text ist also nur die Ausformulierung der vorhandenen Notizen. Über den Zwischenschritt "Karteikarten" ist es möglich, auch sehr umfangreiche Themen abzubilden.

__Kritik__

Für das vorliegende Experiment wurden lediglich 24 Karteikarten plus 1 Strukturzettel von einem LLM erstellt. Für eine echte wissenschaftliche Hausarbeit benötigt man mehr Karteikarten und zwar ungefähr 100+.

__Volltext __

halle54.pdf

Scene annotation with tags

The pictures shows a very simple landscape with a house, a lake and a sun. The problem for a robot is, that the information shown in the picture can't be parsed. The reason is that the picture is technically a 800x600 bitmap file and doesn't provide semantic meaning.

What a robot needs to understand a picture is a sequence of tags. For the concrete picture the tags would be: [sun], [house], [tree], [lake]. In a more elaborated setup the tags would be enhanced with additional informaiton about the position and the color, e.g.
- sun: top right, yellow
- house: center, red
- tree: left, green
- lake: bottom left, blue

These information can be stored in a database which is a json file. Such information can be parsed by a robot. So the missing part is a scene to tag converter which is the core element for a symbol grounding system.

February 25, 2026

The technology of Stanley self driving car in 2005

In addition to the previous blog post which describes the Darpa challenge in 2007, the technology for the competition held in 2005 should be described next. All the information are taken from the document [1].

At first it should mentioned, that the winning car Stanley was the most advanced robot car at this time. It was designed by a team of experienced engineers with a university background and has proven in an official competition its superior over alternative concepts. What makes the situation interesting from today's perspective (the year 2026) is, that the entire technology stack can be called outdated. Stanley including all the mentioned hardware and software has only a value for a museum and is very different from the technology used today.

Even for the timeline of computer science, which is known for its rapid development, such kind of fast aging process is a surprise. The usual assumption is, that at least some of the technology is valid in a modified version for later robotics projects. But this was not the case. It seems, that Artificial intelligence since 2005 has drastically reinvented itself. But let us take a closer look into the year 2005.

The physical hardware of the robot was a Volkswagen Touareg R5 TDI, with a diesel engine.[1] page 2. The engine was powered by gas which comes from a gas station. The track was provided in a waypoint text file, in the RDDF format [1] page 3. The vehicle was equipped with multiple sensors like SICK laser, GPS camera, compass [1] 4. The CPU was located in the trunk of the car and was an Intel Pentium M CPU which was used also for laptops. It was running the Linux operating system on six different machines. [1] 5.

The software stack was divided into 30 modules for sensory perception, path planning, logging and steering. The perhaps most advanced module was the self localisation module which was a particle filter, based on a kalman filter [1] page 8. Multiple incoming sensor streams were fusioned with a probabilistic estimation. The vision module was responsible to detect the drivable area in the map [1] page 12. Handcrafted computer vision algorithm were utilized. The steering controlled was realized as a PID control mathematical equation, [1] page 24.

In summary, the technology used in the Stanley self driving car was a classical combination of a diesel vehicle, a computer cluster in the trunk and a large amount of software which implemented algorithms for vision and steering. In other terms, existing and well known software engineering principle were adapted to robotics development. The idea was that a self driving car is some sort of Open source software project with additional mathematical algorithms for road navigation. Typical problems during the project were:

- how to connect the computer in the trunk with the CAN bus of the car
- how to write all the software modules
- how to make the sensory loop fast enough with C/C++ code

It was the same principle used 2 years later during the 2007 darpa urban challenge and it was valid by all of the teams during this time.

Software engineering has a certain name for such projects: rapid prototyping better known as Throwaway-Software. Its a software system that was written in a short amount of time and has a limited lifespan. None of the hardware and software developed for stanley was reused in later projects. In other words, despite that Stanley has won the challenge the technology was obsolete a few weeks after the race was over.

sources:

[1] Thrun, Sebastian, et al. "Stanley: The robot that won the DARPA Grand Challenge." Journal of field Robotics 23.9 (2006): 661-692.

February 24, 2026

Darpa urban challenge 2007 -- the last great robot project

Before the year 2010, artificial intelligence was mostly a niche discipline within computer science without any impact to society. The reason was, that most of the projects were in an early stage and lots of technical obstacle were visible. Because of this limitation its interesting from a science history perspective to take a closer look what the self understanding was of AI in the past.

The goal of the Darpa urban challenge 2097 was to program self driving cars for an urban environment. These normal size cars were able to stop at a junction and do some parking maneuvers. According to the large amount of documentation and some of the talks from the teams its possible to extract some general principle how the cars were realized. Building safe driving cars in the past was recognized as a hardware and software challenge. One problem was to squeeze high performance server racks into the car's trunk. A second and more serious problem was to write all the software.

One team has written software with 100k lines of code, the next one has even created a software with 500k lines of code. The idea in 2007 was to treat self driving car software similar to a large scale software project similar to the linux kernel. Therfor the existing toolchain was used, namely a C/C++ compiler and modern version control systems including bug trackers. The logic of the car was encoded in endless amount of path planning algorithms, C++ classes and dedicated particle filter for self-localization.

It should be mentioned that the outcome of these large scale projects was poor. Despite the fact that a team of experienced programmers have written all the code, the resulting autonomous car was unable to navigate on the street. Simple task like waypoint following was successful demonstrated, but more complex problems like a road block and unexpected situation have overwhelmed the car's AI software.

On the one hand, the shown self driving cars were more powerful than every attempt in the past to build such vehicles. At the same time it was obvious that these cars were not ready for real world traffic. One disappointed detail was that all the written C/C++ software was only working for the original car but can*t be adapted to other cars or another sort of robotics vehicle. In technical terms, the software wasn't scaling up to slightly different problems which is a sign of bad software design.

In 2007 it was unclear how to write better software which fits to the needs of self driving cars. The reason was a certain bias about the project which was: a) the car is designed as an autonomous vehicle b) the decision making process is implemented in software and planning algorithms. So it was a autonomous computational vehicle which is from a modern AI perspective a dead end. In 2007 nobody was able to see the limitation of these constraints but it was imagined, that AI has to be realized this way.

Let us go a step backward and describe the motivation for the darpa urban challenge. The self understanding in 2007 about robotics was, that robotics is a hardware and software problem and located within computer science. The goal was to make sure that the hardware of a self driving car is working, which means that the lidar is rotating fast enough and that the powerful server build into the car gets enough electricity. The second goal was to program the software which means to utilize the C/C++ and implement powerful algorithms in a robot control system. The hope was that the combination of hardware and software would enable a robot car to take its own decision.

February 23, 2026

Symbol grounding with a DIKW pyramid

A possible model to explain the symbol grounding problem is the DIKW pyramid. Grounding means to translate a higher layer in the pyramid into a lower layer. The layers are representating the same reality in different formats. The perhaps most important transition is from the numerical data layer into the labeled data. For a warehouse robot a GPS sensor reading like (40,10) gets translated into [roomB]. So the low level sensor data gets annotated with tag.

The next layer is the knowledge graph which encodes the tagging information into a semantic network. The realations between the tags are explained, synonyms are introduced and the information are stored in a json file. If all the layers in the DIKW pyramid are established and if an automatic parser can translate upward and downward in the pyramid, its possible to communicate with a robot in natural language. A voice command like "go to roomB and bring me the yellow box" is understood by the robot and executed in the reality.

February 22, 2026

Minsky frames as communiation tool

A minsky frame is a list of key/value pairs as text overlay in a GUI window. Its not intended as an internal data structure within a robot but its GUI gadget which displays information about the game on the screen.

For the example of an intersection simulator the minsky frame was realized with the pygame command:

screen.blit(txt, (35, y))

Which draws a text string to the screen, for example the information "exit_target: WEST". A minsky frame is some sort of form which determines which aspects of the reality are important. The computer determines the value for each item and shows the result on the screen. This allows to solve the symbol grounding problem because the shown text overlay translates the data layer of the DIKW pyramid into the information of the DIKW pyramid.

Making robots more chatty with minsky frames

Autonomous robots in the past were mostly silent systems which aren't talking but processing information. This makes it hard to debug the AI software.

The screenshot shows an alternative which is a very chatty maze robot. His task is to move around and recharge its battery when its empty. The AI brain consists of mulitiple Minsky frames with key/value information. Technically it was realized as python dictionary shown in text overlay windows.

The surprising situation is, that even such a minimal robot game consists of huge amount of information. There are raw sensor data itself like the position but there are also semantic information like the location in the map and the planned actions.

Creating a Minsky frame itself is not very complicated because its a normal python dictionary. What makes the datastructure powerful is, that frames are written information. They are stored in a database and this allows to add information and translate the existing Minsky frames into new information. For example the planned actions of the robot can only be determined if the existing frames are available which are analyzed.

In other word, the AI isn't a list of algorithms, but the AI is a database distributed over hierarchical key/value data.

February 20, 2026

Minimal Grounded language

The symbol grounding problem and especially recent vision language models are very complicated to explain. What is missing is a simplified introduction which should given in the following blogpost. The core element of grounded language is a multimodal dictionary, see the picture before. There is one column with pictures and another column with textual annotation. Both columns are connected to each other which allows to translate back and forth between the modalities.

The availability of such a dictionary allows the computer to understand instructions and also the computer can describe the content of a picture. Let me give an example. Suppose the human operator types in "circle left". The computer starts a look up request in the database and retrieves the correct pictograms showing a circle and the left-arrow. The symbols are shown on the screen and the human operator can enter the next command.

Such a pipeline doesn't look very impressive but it can be scaled up dramatically. suppose the pictograms are replaced with motion capture trajectories, there is one trajectory for "stand up" and another one for "walking to the left". This allows a human operator to control a humanoid robot with words. He types in a word, and the lookup in the database with retrieve the correct trajectory which gets executed on the robot. Such a system is known as vision language action model and is the core element of advanced robotics.

Let us go back to the initial example with a pictogram. The task can be summarized as a translation problem between words and symbols. A word is common string like "[l][e][f][t]". Such a string can be generated by keystrokes on a computer keyboard. In contrast, the matching symbol is a picture which is a bit harder to generate. Pictures are usually drawn in a vector graphics program. The trick is to see both information as connected to each other. The matching is called grounding and allows to simplify the communication.

It should be mentioned that from a computer science perspective, the grounding problem can be called boring or trivial. Storing the pictogram including the textual annotation into a computer is a solved problem. There are many ways available for doing so. Its a simple database problem which can be implemented in python in under 100 lines of code.

The reason why grounded language is at the same task and advanced task is because it is strongly connected with linguistics. So its an interdisciplinary approach between AI, computer science, cognitive science and linguistics. This makes grounded language to a very complex subject.

February 16, 2026

Gründe gegen Linux auf dem Desktop

Innerhalb der Linux Community gibt es ein breites Unverständnis gegenüber Windows User. Linux wird als das technisch bessere System definiert und folglich wird das Winbdows Ecosystem verspottet und es wird unterstellt dass Windows ein Auslaufmodell wäre.

Anstatt an diesen Gedankengang anzudocken und gebetsmühlenartig zu wiederholen was die Vorteil von Linux auf dem Desktop sind, sollen an dieser Stelle einmal die nachteile von Linux genannt werden mit dem ziel dass Windows Benutzer gestärkt werden.

Der wichtigste Grund gegen Linux ist, dass dort die beliebte Datenkbank-Anwendung MS Access nicht lauffähig ist, es gibt ferner kein Opensource pendant was dem Funktionsumfang nahe kommt. Lediglich für MS Word und MS Excel gibt es etablierte Linux Alternativen, für die Desktop_datenbank fehlte bisher die nötige Manpower um ein Open Source Projekt durchzuführen.

Eine Desktop Datenbank wie MS Access ist ferne nicht irgendeine Büroanwendung ähnlich wie ein texteditor oder ein Malprogramm, sondern mit besagter Software kann man ohne Programmieren zu müssen leistungsfähige Frontend und Backend EDV Anwendungen erstellen. Datenbanken mögen für Physiker und Mathematiker entbehrlich sein die eher Programmiersprachen und Mathematik-Software einsetzen, jedoch sind Datenbanken im Geschäftsumfeld die Brot und Butter Anwendung. Der Computereinsatz von Unternehmen ist ausnahmslos datenbank orientiert. Ohne Übertreibung kann man sagen, dass MS Access die Killerapplikation für Compüuter im Büroumfeld darstellt, fehlt diese Applikation sind alle weiteren Vorteile unwichtig.

Linux hat jedoch noch weitere Nachteile. Zunächst einmal ist es auf Consumer-PC nicht vorinstalliert. Wer einen neuen PC im Handel kauft wird dort lediglich Windows 11 vorinstalliert vorfinden. Technisch kann man das Problem natürlich mit einem selbst erstellten USB Bootstick lösen, jedoch wird der Umsteiger schnell erkennen, dass es im PC Fachgeschäft für Linux generell keine Software verfügbar ist. Sämtliche kommerzielle Software von Softwarefirmen die an Privat und LBusinesskunden verkauft wird, wurde dezidiert für das MS Windows Betriebssystem entwickelt.

es gab früher einmal auch kommerzielle Linux software die im Fachladen als Package verkauft wurde, jedoch hat sich das konzept nie durchsetzen können. Die PC Welt ist deshalb zweigeteilt: es gibt einmal kommerzielle Software entwickelt von Firmen für das Windows system und dann gibt es noch Open Source software die es nur online im Internet gibt, die aber nicht kommerziell vermarktet wird. Dadurch stehen normale 'Anwender die auf Linux setzen, plötzlich allein da. Sie müssen sich das nötige Fachwissen mühsam anlesen, sie müssen die Software online herunterladen und wenn es dabei Probleme gibt erhalten sie in Foren den Rat doch einfach auf eine andere Linux Distribution zu wechseln, weil die angeblich viel besser wäre.

Für den Durchschnittsanwender, der einen PC mit vorinstallierten Windows verwendet gibt es keinen Grund auf Linux zu wechseln. Er wird dort nur Nachteile erleben, und wegen der steilen Lernkurve an einfachsten Aufgaben scheitern wie der Installation eines Spieles oder dem Öffnen eines Word Dokumentes. MS Windows ist in sämtlichen Punkten überlegen und bildet unverändert den Industriestandard für Desktop PC sowohl bei Privatanwendern als auch in der Geschäftswelt.

The information layer in the DIKW pyramid

The lowest layer in the DIKW pyramid is the data layer which can be desribed easily. There are raw sensor data like distance, temperature, gps coordinates which are stored in a numerical format. The next layer in the pyramid, the information layer, is harder to describe. A working thesis is, that the information layer consists of [tags].

For the example of a warehouse robot, the tag cloud would be: [roomA, roomB, roomC, shelfNorth, shelfsouth, shelf1, shelf2, obstacle, battery, chargingstation, barcode, path, left, right, speed, direction, batteryempty, order]

Of course the tag list is not complete, there are additional tags available but for reason of simplication this might be a starting point. These tags are providing context because after selecting one tag, possible alternative tags are not activated. For example, the goal for the robot might be [roomB] but not [roomA, roomC]. The robot might rotate to [left] but not to [right]. So the context of a tag are always the tag which might be possible but are not activated at the moment.

All the tags are creating a semantic network. In contrast to a full blown ontology or AI frames, a tag based information is more minimalist. Every tag can be activated or not similiar to the tags in a blogging post for annotating a document.

The interesting situation is, that there is an intersection available between low level sensor data and mid level tagging cloud. For example:

- gps sensor -> [roomA]
- gps sensor -> [direction]
- distance sensor -> [obstacle]

For desribing the robot's behavior both layers (data and information) are important. The robot needs to log the numerical raw sensor data and also the robot needs to annotate the current sensory perception with semantic information.

What we can say for sure is, that tagging information doesn't belong to the lowest data layer. A sensor like a gps sensor has no builtin tagging mechanism. The sensor doesn't know the position of a certain shelf, or doesn't know if the robot is in roomA or in roomB. What the gps sensor knows instead are precise x/y coordinates. The reason is, that the sensor hardware is able to generate such data. Its up to a higher instance in the DIKW pyramid to process these data.

February 15, 2026

Quiz: The DIKW Pyramid (Data to Information)

The symbol grounding problem can be explained with a DIKW pyramid. The following quiz has the domain of a warehouse robot.

Instructions: Match the raw sensor output (Data) to the correct operational meaning (Information).

1. A Time-of-Flight (ToF) sensor returns the integer 20. The system context defines the unit as centimeters. What is the information?

A) The robot's battery is at 20%.
B) An object is detected 20 cm in front of the sensor.
C) The robot has picked up 20 items.
D) The robot is moving at 20 km/h.

2. The internal IMU (Inertial Measurement Unit) registers a sudden spike of 9.8 m/s² on the X-axis while the robot was supposed to be stationary. What is the information?

A) The robot is successfully charging.
B) The robot has reached its top speed.
C) The robot has likely been struck or tilted unexpectedly.
D) The floor is perfectly level.

3. A barcode scanner on the gripper returns the string SKU-9921-RED. What is the information?

A) The gripper is currently empty.
B) The robot needs to be rebooted.
C) The item currently held is identified as a "Red Small Widget."
D) The ambient light in the warehouse is too red.

4. The wheel encoders report 500 rotations, while the visual odometry (camera) reports 0 meters of forward progress. What is the information?

A) The robot is moving faster than expected.
B) The wheels are slipping on a slick surface (e.g., oil or plastic).
C) The robot has arrived at the loading dock.
D) The camera lens is dirty.

5. A thermistor near the motor controller reads 95°C. The operating limit is 80°C. What is the information?

A) The motor is warming up to optimal temperature.
B) The warehouse heating system is turned on.
C) The motor controller is overheating and at risk of failure.
D) The robot is in a refrigerated zone.

6. A pressure-sensitive safety bumper sends a High/1 logic signal to the CPU. What is the information?

A) The robot is clear of all obstacles.
B) The robot has made physical contact with an object or person.
C) The battery voltage is stable.
D) A new software update is available.

7. The battery management system (BMS) reports a voltage of 19.2V on a 24V rated system. What is the information?

A) The battery is fully charged.
B) the battery is at a "Low" state and requires docking soon.
C) The robot is currently plugged into a wall outlet.
D) The sensors are calibrated correctly.

8. An acoustic sensor detects a frequency of 110 dB during a lifting sequence. What is the information?

A) The lifting mechanism is operating silently.
B) The warehouse music is too loud.
C) There is an abnormally loud grinding noise in the lift gears.
D) The item being lifted is very light.

Correct answers

1. B (The number 20 + unit cm = distance information)
2. C (Sudden acceleration on a stationary axis = impact/tilt information)
3. C (String data + database lookup = object identity information)
4. B (Rotation data vs. no movement data = traction loss information)
5. C (Raw temperature > threshold = critical status information)
6. B (Binary signal from a bumper = collision detection information)
7. B (Voltage level compared to system rating = energy status information)
8. C (Decibel level + operational context = mechanical fault information)

February 14, 2026

Language as ghost in the machine

The term "Ghost in the machine" is usually referencing to artificial intelligence which allows a robot to do useful things. The body is the robot's hardware and therefor the software is soul realized in software. With such an understanding, Artificial Intelligence is an advanced computer program based on AI related algorithms.

But what is if this working thesis is wrong? The assumption is, that AI can't be realized in hardware and also not in software. But Artificial intelligence is similar to a ghost very hard to identify. Its basically natural language. In other words, the ghost is an English dictionary. Its not stored in the software itself, because language is a communication pattern used as in between technology.

Natural language will become only visible if a human speaks with a robot. The human formulates a request like "move forward" and the robot responds to this request. Therefor, the AI isn't stored in the robot itself, but its located in air between human and robot. This would explain why it has much in common with a ghost which also has no measurable location in the reality. But a ghost is located beyond, which is the environment of the reality, or some sort of hyperspace.

Let us try to describe language from a more scientific perspective. A statement like "grasp the apple, please" has no physical size. Words are not part of the visible reality, but they are abstract symbols located in the oral space. Despite this missing physical borders, language is part of the reality because language is used for many purposes. In a mathematical sense, language is a communication technology used in a sender to receiver interaction. Such a protocol has much in common with a ghost like behavior. If language gets submitted over radio waves it has some similarity to a supernatural phenomena.

What makes language interesting for artificial intelligence is, that without language a robot is not able to think. Without language, robots are reduced to a pocket calculator which can execute an algorithms but isn't understanding the meaning of objects in the reality like a table, an apple or a plate. The ability to parse natural language is equal to implement artificial intelligence.

February 11, 2026

Playing pong videogame with a perception buffer

In addition to the previous post here is another example with a working AI based on a buffer. The game has two modes: normal mode which runs the simulation and a pause mode which shows AI information including the perception buffer, the action buffer and the predicted trajectory. The text box shows the known information from the game engine which are the ball position, its velocity and other information. the action buffer stores information what to do next. This information is submitted to the paddle.

Because of the simplicity of the pong videogame, the AI master the challenge with ease, it will move the paddle towards the correct position.

From a technical perspective the buffer was realized with a python dict for storing the information in a key/value syntax. Creating such a dictionary and showing the content on the screen is very simple. The innovation has to do with the assumption that such a buffer modulates the communication process. The AI brain isn't imagined as a sophisticated algorithm, but its a database which holds information as natural language. This will generate multiple subtasks like a) how to convert the game state into a perception buffer b) how to translate the perception buffer into the next action and c) how to submit the content of the action buffer back to the game engine.

Computer programming vs. AI programming

Computer programming is the art of software creation. It has to do converting a real world problem into executable program code like Java or C/C++- A typical example is to program a pong videogame, or improve a database management system.

Modern computer programming since the 2010s does't reinvents the wheel but its using existing operating systems, programming languages and libraries. For example videogames are written with the help of a 2d game library, and database systems are created on top of existing SQL databases.

Programming has always the goal of creating software and modify existing software which is running on a computer. All the modern technology like the Internet, word processing software, and database software is the result of well engineered software applications.

Despite the importance of programming in computer science the discipline has a blind spot because its not possible to program an AI software or write a software for a robot. Many attempts in writting robot software in C/C++ and Java were presented in the past, but most of them have to be called a failure. It seems, that artificial intelligence is working different from classical software engineering principle. Its not possible to reuse existing software libraries or take advantage of existing programming languages. Even the most powerful programming language avaialble which is Python in combination with the latest mathematical libraries is useless for realizing a robot project. The reason is, that software programming describes the world as computer centric. The attention is always directed toward a computer and towards its ability to execute a software. For example the Python interpreter provides a list of commands. Programming means to arrange these commands to a fixed structure which is a computer program, namely in classes in subroutines. Then the program canb e exucuted. The problem is that such a program won't realize artificial intelligence.

There is a single programming excersise available which demonstrates the transition from classical software programming towards artificial intelligence which is activitity recognition in motion capture. This specialized problem has its roots in computer animation and was first mentioned in the 1970s. The task is to annoate the movements of the mocap markers with textual names like sitting, jumping, walking and so forth.

Computer programming is focussed on the CPU of a computer. The computer has to solve a problem, e.g. adding two numbers or search in a database with a search algorithms. In contrast, the activity recogntion task works with a communiation paradigm similar to an internet protocol. The idea is to convert low level data into high level data. Such a communication system is an open system, which is seldom described in the programming literature. The reason is that communication is referenced to external parties located outside of a computer.

Classical programming works with the algorithm paradigm as a theoretical understanding. The algorithm is executed on the machine and solves a problem. In contrast, communciation oriented programming works with the sender to receiver paradigm. There is no algorithms needed but there is a message which is delivered over the network. Programming a robot is similar to implementing a communication protocol, there is also a sender, a receiver, a message and a protocol. And the robot never runs an algorithm, but the robot receives a message.

February 09, 2026

Roboter steuerung mit der DIKW Pyramide

Obwohl die DIKW Pyramide in der Literatur häufig diskutiert wird, ist ihre Anwendungsmöglichkeit innerhalb der Robotik nur selten dokumentiert. Als motivierenden Einstieg hier ein Beispiel für einen Warehouse roboter. Auf der untersten Ebene (Daten) fallen folgende Messwerte an:

- Geokoordinaten: "X: 194.5 / Y: 10.2"
- Prozentwert: "12%"
- Barcode scan: "ID: 00056789"
- Temperatur eines Servomotors: "42°C"
- Geschwindigkeit: "0.2 m/s"
- Sensor Schaltzustand: "Bit 1 = On"
- Zeitstempel: "12.05.2025 / 10:02:01"

Diese Daten sind Rohdaten wie sie von Sensoren ermittelt werden, also über gps triangulation, barcode reader oder von einem Temperatursensor. Eine tiefergehende Bedeutung haben diese Daten nicht, sondern sie werden nur mitgeloggt und in einer Datenbank als numerische Werte gespeichert.

Zur Steuerung des Roboters interessanter ist die nächste höhere Ebene der DIKW Python: Information.

- Batterieladezustand ist gering, bezug zu 12%
- Standort des Roboters ist Regel 8, Fach A. Bezug zu Geokoordinaten
- Paket mit 00056789 ist eine Palette mit Glasflaschen, Bezug zu Barcode scan
- Motorüberhitzung droht, Bezug zu 42°C

Die Zuordnung von Daten zu Information erfolgt mit Hilfe von weiteren Datenbankeinträgen. Darin sind Textinträge gespeichert wie "Motorüberhitzung droht" und Bedingungen wann diese zutreffen. Die Informationsebene ist nicht als numerische Daten gespeichert sondern besteht aus kurzen Sätzen in natürlicher Sprache. Höhere Ebene in der DIKW Pyramide beinhalten abstraktere Formulierungen die Expertenwissen beinhalten und für die Aufgabe des Roboters wichtig sind.

Technisch gesehen ist eine DIKW Pyramide ein DAtenbank-MAnagement system, worin die Daten/Informationen auf unterschiedlichen Tabellen verteilt sind und über Regeln zusammengefügt werden. Der Inhalt der Datenbank wird in Echtzeit aktualisiert. Auf der höchsten Ebene (Wisdom) ist die Steuerung des Roboter dann sehr simpel. Man sendet einen natürlich-sprachlichen Befehl wie "Fahre zum Regel C und hole die Glasflaschen und bringe sie zu Regel B". Dieses High level Kommando wird dann übersetzt in konkreten Befehle an den Roboter.

February 07, 2026

Robot control with a DIKW pyramid

Symbol grounding is about moving down and moving up along a dikw pyramid. This allows to hide the details and expand the details of a subject. For the example of a warehouse robot the dikw pyramid can be implemented as a python dictionary which shows only the upper layer and the bottom layer:

dikw_pyramid={
"wisdom": {
"Go to the loading bay and clear the blockage.",
},
"data": {
"lidar_dist": 0.5, "weight_kg": 25.0, "coords": (12.4, 45.8)
},
}

The raw sensor data are feed into the data layer and are formmated as numerical values. In contrast the wisdom layer of the pyramid stores the voice commands formulated in English sentences. The task for the symbol grounding engine is to translate between these layers. This is realized by instruction following (from top to bottom) and activitity recognition (from bottom to top).

February 06, 2026

Robot swarm builds a house

The picture shows multiple robots on a construction site who are controlled by a large language model over a longer time span on the same goal. The AI technology is based on natural language for reducing the state space drastically. The Large language model describes the project in English nouns and verbs and the single robots are converting the commands into physical action.

Database for npc quest generator

NPC quests aren't generated with algorithms but with a database. The database contains textual elements, in case of a warehouse robot a mini database looks like:

warehouse_db = {
"items": {"SKU-01": "Lithium Batteries", "SKU-05": "Hydraulic Fluid"},
"zones": {"Zone_A": "Cold Storage", "Zone_B": "Hazardous Materials"},
"quest_types": ["Fetch", "Escort", "Cleanup", "Security Patrol"]
}

Such a database modulates the communication. An NPC quest like "fetch Lithium Batteries from Zone_B" is valid because the words are available in the database. Another possible quest taken from the former minidatabase is "security patrol Cold storage". Increating the intelligence of the robot doesn't mean to invent advanced algorithm but to populate the database with more entries. If the robot knows the names for important items and relevant locations in a warehouse he is an expert for the domain.

February 03, 2026

Annotating video games

The screenshot left shows a simple random walk in a path with two robots. Even if the picture is provided in maximum resolution it remains unclear what the meaning is of all these pixels. Human can guess that the connected nodes are the allowed path, but computers have no idea how to interpret the image.

The situation becomes much clearer by activated the pause mode shown on the right. There is an additional textual window which explains, that the red circle is robot1 who is moving from node #4 to #5 and has a full battery. These information can't be parsed from the original picture so the text box provides additional meaning. Another feature of a text box is, that computer will understand the information much easier because all the data are formatted in a key value syntax which is the prefered layout for machine understanding.

Such a text box is the core element of Artificial intelligence because it adresses the symbol grounding problem. The text box communicates the current game state to an external instance which is a human observer. Instead of analyzing how the simulation was programmed internally the new question is how to talk about the domain in natural language. Such a task is realized with a user interface in general and with a text box in detail.

Simple example for a head up display

An entry level example for demonstrating the power of head up displays and grounded language is a route navigation problem which is perhaps the most easiest example for instruction following. The robot gets controlled with a random generator and after pausing the game, a text box with additional information on the screen. This text box contains of the grounded language which is important to provide meaning.

Every head up display is based on a two tier architecture: there is a graphical screen in the background and a textual screen in the foreground. Such kind of text boxes are common design element in videogames, and they are also useful for artificial intelligence. The compact representation in the text box helps a computer to understand a videogame.
Grounding means, that the AI is able to generate and format the content in the text box.

The text box is updated if the video game status is changing. Both layers are synchronized automatically. Programming such an upto date grounded language is the core problem. In case of the graph traversal robot, the information shown in the text box are easy to format. In case of a kitchen robot or a self driving car the text box contains more complex information which are harder to maintain automatically.