June 02, 2026

Grounding mechanism 1o1

 A DIKW pyramid consists of abstraction layers like Data, information and other. A grounding mechamism maps the items in the layer. In an example warehouse robot, the data layer cosnsits of sensor readings like GPS Coordinates, lidar distance, and battery capacity while the information layer consists of [tags] like "battery_full, north, obstacle_ahead".

The grounding mechanism generates the links between the entries. For example the lidar distcance of 10 cm is mapped to "obstacle_ahead" while the battery level of 10% is mapped to "Battery_empty".

In general, a grounding mechanism is some sort of matching game. it answers the question which situation is mapped to which description. Such a mapping is the core element of an advanced artificail intelligence.

To demonstrate why a matching game enables artificial intelligence let us assume an example. Suppose the human operator submits a command to the warehouse robot which is "move to the green area, grasp the small box on the left side, bring the box to the blue area, drop it into the shelf, then recharge your battery".

If the grounding mechanism is missing or was deactated, the command is interpreted as string with 144 characters. It wasn't formulated in the C/C++ programming langauge but it can be stored only in the main memory.

Suppose the robot has a builtin grounding mechanism, than its possible to parse the sentence word by word. The word "green" is matching to a certain RGB value, the word "box" is mapped to a certain shape in the camera, the word "shelf" is mapped to a picture of the shelf and so on. The parsing algorithm fetches a word from the sentences, and takes a lookup into the database to identify the item from the data layer of the DIKW pyramid. Understanding a sentence from a robots perspective has to do with matching items from the information layer to the data layer.

June 01, 2026

Symbol grounding problem as answer to np hard algorithms

 Before its possible to describe grounded language there is a need to explain who artificial intelligence was imagined until the year 1990. It was treated similar to computer programming in the sense that there is a CPU which executes a program and its up to the programmer to make the algorithm as intelligent as possible. Artificial intelligence was thought as a very advanced computer programmed which is executed by a computer.

In other terms, the computer was seen as a problem solving machine and the only detail problem was which sort of algorithm is needed to solve a certain problem. For example motion planning in robotics was solved with motion planning algorithms while computer chess was solved with alpha beta prunning algorithms. Most of these AI related algorithms were designed as search algorithms. The computer was used to traverse the state space of the domain and this allowed the computer to find the optimal action.

The symbol grounding problem formulated by Stevan Harnad questions this algorithm oriented paradigm. This might explain why even today grounded language is a niche topic within computer science. Because computer science and algorithms were often treated as the same thing, it was outside of the scope how to program a computer without an algorithm.

Let us listen closely how Harnad, Brooks and Steels are arguing about grounded language. The core element is the sensory perception of a robot. The assumption is that the perception is transmitted to the computer. There is no need to calculate something but the focus on the data transfer. A light sensor detects light and the information from the sensor is send over a cable to the computer. The symbol grouding problem doesn't focus on the computer itself, but on the cable between a sensor and a computer, very similar to a computer network. Computer networks are different from a turing machine, they are never running algorithms, but a computer network communicates data often organized in a protocol layer.

The paradigm shift from algorithm centric computers towards protocol oriented data transmission is the core element of the symbol grounding problem. Artificial Intelligence isn't explained as processing or program executation, but Artificical Intelligence is imaged as the air gap between two hosts.

Let us compare the hardware. In classical algorithm oriented AI the basic building block is a central processing unit, which can be a 32bit CPU. The CPU is built with transistors on a chip and gets controlled by Assembly language. In contrast, the symbol grounding problem assumes that there is a Cat5 copper cable which delivers packets. Its up to the network engineer to define the protocol of the packets.

The paradigm shift can be explained for np hard problems. NP hard is a certain category of problems related to artificial intelligence which can't be solved with a computer. Nearly all robotics motion planning problems like the piano movers problem or model predictive control are np hard. The term np hard is referencing to the runtime of an algorithm executed on a cpu. In other words, even a modern 64bit CPU can't solve these problems because the hardware is too slow.

The holy grail in computer science is how to solve np hard problems. The answer was given by Stevan Harnad in his famous 1990 paper. He didn't mentioned np hard problems, but its possible to solve np hard problem with grounded language. Instead of using a CPU to calculate a mathematical problem, a copper cable is used to solve a data transmission problem. This new perspective is powerful enought to solve motion planning problems in robotics.

May 30, 2026

The transiton from closed to open robotics systems

The last AI winter went until the late 1990s. In this period, some robotics were built by the engineers and some AI algorithms were designed but all of them failed. The only thing working reliably was a simple CNC machines which were used in a static factory setups to cut a piece of metal. Even a simple pick&place robot for an assembly line was beyond the capabilities of the 1990s technology.

Today's robotics in the 2020s is much more powerful and this improvement can be explained with a paradigm shift. Robotics until the 1990s was organized with a closed system assumption. the idea was to treat a robot as as a microcontroller which runs a software in the batch mode. It was a mathematical and a computer science artifact which was controlled by deterministic algorithms implemented in a programming language like C/C++. The assumption in the 1990s was, that such a paradigm is powerful enough to create artificial intelligence. The assumption was that the existing tools like a 16bit microcontroller, a PID controller, a Kalman filter or a C compiler allows to build robots.

What the engenners didn't know was that the mentioned tools are equal to a dead end. Even with today's knowledge its not possible to build a robot with such an equipment. What is needed are different tools located outside of computer science which allows to build open systems. these advanced tools are: 

- motion capture: a human actor demonstrates a movement for a camera
- grounded language, a vocabulary to communicate with a robot
- a multimodal dataset which stores mocap data and semantic annotation in a database

These tools were missing in the late 1990s. Not because of technical constraints, but because of missing understanding for the difference between open and closed systems. A robot can be built only by one of the principles: either the robot understands natural language or it doesn't. Either the robot can playback motion capture data or it can't.

The dominant reason why these advanced tools were missing in the 1990s is because they are located outside of mathematics and computer science. Motion capture has its root in biomechanics and in animated movies. It was introduced for Rotoscoping which allows to draw cartoons. While grounded language has its root in linguistics which is located in the humanities which is the opposite of mathematics. 

In the 2020s computer science has redefined its own boundaries because the former restriction to mathematics and algorithm theory was not able to solve robotics problems. No matter which mathematical theory was applied to robot control, all of them failed. The dominant problem in robotics control is the state space explosion. A robot has many degree of freedoms and planning inside the error map of such a kinematics chain will need too much CPU cycle. There is no algorithm available which can search faster in the state space, but the mathematical perspective itself is the obstacle.

The inner working of a state of the art robot from the 2020s can be explained as a machine who understands English commands and has access to a motion capture database. These tools combined allows the robot to solve complex problems like biped walking and grasping objects. From an AI perspective, the intelligence of the robot isn't encoded in a computer program but the intelligence has its origin outside of the robot, namely motion capture data and verbal commands. The robot is reduced to a minimal device which executes an existing trajectory with the servo motor and is converting a command into action. For example, the human operator may say "move with trajectory #12", after fetching the trajectory from the database the robot activates its servo motors. Strictly spoken the intelligence has its origin not in the robot but the intelligence comes from the environment namely the human operator.

Robots constructed as open systems can be seen as communication devices instead of computing devices. They are not running a program similar to a Turing machine but they parsing a message similar to a Telefax machine.

May 26, 2026

The failure of AI related programming language part 2

 In addition to a previous blogpost [1] the problem with 5th generation programming languages from the past should be explained in detail.

The initial situation in the mid 1980s was the existence of powerful 4th programming languages like C, Pascal and C++ which have simplified source code development. In contrast to former assembly language these languages offered powerful libraries and were able to compile on different computer hardware. Its pretty easy to write videogames like Pong and jump'n'run games in C and C++.

Unfortunately, these languages were not able to master robot control and AI problems. Some attempts were made to program game AI in the C language but in most cases the source code is hard to read because its a finite state machine, or the algorithm needs a high amount of CPU cycles because its a breadth first search algorithm in computer chess.

The consequence was to rediscover dedicated AI programming languages like Lisp and develop new 5th generation languages like Prolog and KL-one, which allows agent oriented programming. the promise was, that the programmer defines only facts and the reasoner module is able to plan by itself the robot's action.

It should be mentioned that AI related programming languages were a failure from day 1. The problem is, that its hard to utilize Prolog for a concrete example, e.g. to control a robot. 

The main problem with so called 5th generation languages is, that they were designed with a classical programming language paradigm in mind. There is an interpreter which executes the code on a computer and the user is asked to formulate the problem in the syntax of the programming language. This constraints makes sense from a computer science perspective, because every programming language is organized with this principle. Without an interpreter or a compiler the computer can't execute the code. But from a domain specific perspective, its not possible to specify a problem this way.

The working thesis is, that existing 4th programming languages like C/C++ are more than powerful and doesn't need improvement but can be utilized for AI related programming task. What is needed to bridge the gap towards the reality is a user interface, formulated in C/C++.

A GUI interface manages the communication between a computer and the human user. It contains of buttons, text fields and command lines. Such a user interface is the key element to design advanced AI software. In the optimal case, the user interface looks like a head up display for a robot. There is a video camera picture shown on the screen and on top there are multiple text boxes as overlay with additional information formulated in grounded languages. Such kind of futuristic GUI can be programmed in a classical programming language like C.

[1] https://ai-and-robotics.blogspot.com/2026/04/the-failure-of-ai-related-programming.html 

May 24, 2026

Textuelle Interaktion für Lagerroboter

 Textuelle Interaktion für Lagerroboter

Das folgende game log zeigt die interaktion zwischen 3 Agenten: eine game engine, eine Spielerin und einen internen Lehrer. Die Interaktion erfolgt ausschließlich über ein Text terminal. Das Ziel ist das Text adventure zu gewinnen indem Kommandos eingegeben werden und der interne Lehrer befragt wird.

=== WAREHOUSE ROBOT SIMULATION ===
KI_Spielerin: "starte auf Position A1"
Game_engine: "Roboter bereit. Ziel: 10 Kisten zu Regal B5 transportieren."
KI_Spielerin: "scanne Regal B5"
Game_engine: "Regal B5: 3/10 Kisten fehlen."
Internal_Teacher: "Nutze Greifarm Modus 3 für Kisten >20kg."
KI_Spielerin: "greife Kiste bei A1"
Game_engine: "Kiste erfasst. Gewicht: 25kg."
KI_Spielerin: "bewege zu B5"
Game_engine: "Weg frei. Transport startet."
KI_Spielerin: "platzieren Kiste"
Game_engine: "Kiste in Regal B5 abgelegt. 1/10 erledigt."
KI_Spielerin: "scanne Regal B5"
Game_engine: "Regal B5: 2/10 Kisten fehlen."
Internal_Teacher: "Prüfe Batteriestand. Warnung: 15%."
KI_Spielerin: "fahre zur Ladestation"
Game_engine: "Ladestation erreicht. Ladevorgang startet."
KI_Spielerin: "lade bis 80%"
Game_engine: "Batterie: 80%. Transport fortsetzen."
KI_Spielerin: "bewege zu A1"
Game_engine: "Position A1 erreicht."
KI_Spielerin: "greife nächste Kiste"
Game_engine: "Kiste erfasst. Gewicht: 18kg."
KI_Spielerin: "bewege zu B5"
Game_engine: "Weg blockiert. Hindernis: Palette bei A3."
KI_Spielerin: "frage nach Lösung"
Internal_Teacher: "Umfahren oder Hindernis entfernen. Risiko: 2 Min Verzögerung."
KI_Spielerin: "umfahre Hindernis"
Game_engine: "Alternative Route berechnet. Transport startet."
KI_Spielerin: "platzieren Kiste"
Game_engine: "Mission 50% abgeschlossen."
=== ENDE ===

May 23, 2026

Grounded language in a nutshell

 Grounded language can be described as sensor data tagging. It connects the internal raw sensory data of a robot with the external semantic tagging system. The linking is realized in a DIKW pyramid and improves man to machine communication. Such a communication system allows the robot to offload the intelligence to a human.

Here is an example. Suppose a warehouse robot stands in front of an obstacle. Because the robot's software isn't able to solve the situation, the robot asks a human operator what to do next. With the help of grounded language the output of the robot is: "obstacle: near, battery: 85%, question: What to do?". The human operator reads the textual message and takes a decision which is send back to the robot.

There are multiple techniques available how to implement such a system in software, for example with a handcoded language parser, or with a neural network. The shared similarity is, that all these attempts are based on natural language and put a high emphasizes on man to machine communication.

The term grounding is referencing to multiple situation:
a) its a link between sensor data and textual annotation
b) its a link between the internal robot structure and the external environment
c) its a link between low level and high level problem description

In more colloquial terms, grounded language means to use English for teleoperation of a robot. This principle seems not very impressive because it was demonstrated in science fiction movies multiple times in the past. The innovation is, that there is no alternative available to realize artificial intelligence. That means all advanced robots are built as teleoperated machine who understands English language.


Lessons learned from Douglas Lenat's Cyc

 During the late 1980s the Cyc project was a large scale AI project. The promise was to create a database with handcrafted Lisp rules which is able to reason about the world. The attempt has failed but that is no problem because it its possible to analyze the reason why.

From today's perspective Cyc was an early attempt to create a dataset. A dataset is a .csv file but doesn't contain of computer code. Datasets are storing numbers and text. During the 1980s it was unknown how to create large scale datasets and Cyc had some builtin mistakes:

a) there was no word2vec algorithm which allows to convert the textual information into numerical representation
b) Cyc was encoded with rules but not with question answer pairs

A modern dataset which is superior over cyc would solve these mistakes. A common dataset used for training neural networks contains of a simple Q&A structure like "What is the capital of france? -- Paris". and it would use a word embeddings algorithm to project the information into a numerical space which can be parsed by neural networks.

The Cyc knowledge base was a combination of Lisp software and textual information. It was a hybrid of computer code and a dataset. Such kind of knowledge base was replaced by data only datasets which have become popular since the deep learning boom. In a data only dataset there is no computer code but only data itself which can be text or images. The computer code which is searching in the data is externalized in a deep learning library.

May 21, 2026

A review of bottom up robotics

In the late 1980s there was a fundamental paradigm shift available in the domain of Artificial Intelligence, called bottum up robotics or subsumption architecture. It wasn't a new algorithm but at first it was a criticism of AI in the past. Bottom up robotics is mostly the description that program controlled top down robotics until the year 1990 has failed. Instead Brooks recommended to build simple sensor driven robots in the style of William Walter's turtle robot in the 1940s.

In a single sentence, Brooks argued, that its unclear how to program robots and instead of trying it harder, the answer is to give up and build instead Analog beam robots with a single sensor and a single motor. Of course, such a robot doesn't make sense because the goal is to build high complex machines which can do practical tasks and not to build a light following bug which can't do anything.

Despite of this step backward, bottom up robotics had become a great success. Many other researchers have agreed to Brooks, and similar architectures like Tilden's BEAM robots were popular.

Let us describe bottom up robotics from a birds eye perspective. These robots or artificial bugs are mostly controlled by its environment and by a random generator but not by an internal program. This paradigm shift was the real novelty of Brooks. It introduced a concept in which the former program oriented approach in robotics was dismissed in favor of external control.

Brooks identified correctly what sort of technology can't be realized. Its not possible to program a robot similar to a computer program. It doesn't make sense to write a C program and compile it for a microcontroller which is doing something with a robot because such a C program will provide a reality gap to the environment. A high complex task will require a high complex computer program and nobody knows who to write down the source code.

Let me give an example. Before the advent of bottom up robotics, the shared assumption in artificial intelligence was, that a robot who should grasp an objects needs to be programmed first. There are 5000 lines of code which are planning the grasping, solving the mathematical equation to determine the trajectory of the gripper and monitor if the robot is successful. Its impossible to write and improve such a C program.