June 13, 2026

Der letzte KI Winter von 2000-2010

 Die Zeitepoche von 2000 bis 2010 eignet sich ausgezeichnet um das scheitern früherer Versuche zu beschreiben, Künstliche Intelligenz zu erforschen. Ähnlich wie die Jahrzehnte davor gab es mehrere KI Projekte sowohl in der Industrie als auch akademischer Natur, die ähnlich wie die Projekte in den 90er und 80er ohne Erfolg verliefen:

- semantic web von Tim Berners Lee
- Cyc von Douglas Lenat
- WordNet für maschinelle Übersetzung
- DARPA Grand Challenge
- kognitive Architektur wie SOAR
- Roboter ASIMO von Honda

Zwar wurden im Rahmen dieser Projekte lauffähige Software entwickelt und Datenbanken erstellt. Es gibt zu Cyc und zum Semantic Web viele praktische Beispiele. Auch das selbstfahrende Auto Stanley von Sebastian Thrun ist tatsächlich autonom gefahren und wurde über Software gesteuert. Nur waren die verwendeten Technologien nicht skalierbar und ohne praktische Bedeutung. Die normale Internettechnologie bestehend aus HTML funktioniert stabil genug ohne dass man eine Erweituerng namens Semantic web benötigte, und die Wordnet Datenbank hatte eine zu geringe Qualität um damit Texte zu übersetzen.

Man kann sagen, dass die Zeitspanne von 2000 bis 2010 eine verlorene Dekade für die KI Forschung war. Es wurde viel ausprobiert, es gab namenhafte Forscher die sich näher mit der Thamatik beschäftigten, aber es gab keine Durchbrüche oder anwendungsbereite Demonstrationen im eigentlichen Sinne.

Auf den ersten Blick klangen die o.g. KI Projekte vielversprechend. Der Ansatz ein selbstfahrendes Auto zu bauen was gegen andere Autos in einem Wettrennen antritt hört sich nach einer spannenden Herausforderung für die Robotik an. Und die Idee common sense knowledge in einer Cyc Datenbank zu bündeln erscheint wie der durchdachte Versuch eine denkende Datenbank zu erschaffen. Dennoch stellte sich heraus, dass die Konzepte fehlerhaft waren. Sie verkörperten Sackgassen, die einmalig beschritten wurden und dann nicht weiter verfolgt wurden.

DIe Geschichte der Künstlichen Intelligenz verkörpert wie keine andere Wissenschaft eine Abfolge von gescheiterten Bemühungen. So ähnlich als wenn wenn mehrere Bergsteiger einen Berg auf sehr unterschiedliche Weise besteigen wollen, aber keinem gelingt auch nur die erste Etappe zu meistern.

Man kann das Scheitern anhand mehrerer Parameter erklären. Entweder ist die benötigte CPU Rechenleistung um einen Algorithmus auszuführen zu groß. Dies ist bei den meisten Pfadplannungsalgorithmen in der Robotik inkl. model predictive control der Fall. In der Theorie könnte man den game tree über einen Algorithmus durchprobieren so wie man Computerschach spielt, doch in der Realität sind physische Computer des Jahres 2010 dafür zu langsam. Also ist der Ansatz nicht praktikabel. Ein weiteres objektives Kennzeichen für ein scheitern ist der hohe manuelle Aufwand der beim Erstellen von Datenbanken wie wordnet oder Cyc anfällt. Es ist schlichtweg zu teuer über Jahre hinweg manuell eine Datenbank zu erstellen die dann keinerlei Nutzen erbringt. Ein weiteres Kennzeichen für gescheiterte KI projekte ist der hohe Programmieraufwand gemessen in lines of code. So wurden für die DARPA Grand Challenge mehrere 100k lines of code erstellt und zwar von jedem teilnehmenden Team einzeln. All diesen Code in C/C++ zu erstellen und zu warten ist ein hoher Aufwand insbesondere wenn man den code nicht erneut verwenden kann sondern er stark auf ein konkretes Auto und ein konkretes Team zugeschnitten wurde.

Die genannten Probleme des hohen Rechenbedarf, des manuellen Aufwandes beim Datenbank-Erstellen und das manuelle Erstellen des Source code waren wohl von 2000 bis 2010 bekannt, aber es war unklar wie man den Aufwand senken kann.

Als kleiner Ausblick auf die Zeitpsanne ab 2010 soll kurz erläutert werden, welche Ansätze nicht verfügbar waren.

- Ferngesteuerte Roboter
- Datasets um neuronale netze zu trainieren

Beide Themen wurden als unwichtig definiert. So wurden ferngesteuerte Robotik nicht als erstrebenswert angesehen, weil laut selbstgewähltem Ziel es darum ging autonome algorithmengesteurte Roboter zu entwickeln aber keine RC cars zu bauen. Auch dem Thema Datasets und preprocessing wurde keine Aufmerksamkeit gewidmet. Es wurde angenommen dass künstliche Intelligen im neuronalen netz verortet sei und die Datenqualität unwichtig wäre mit der das NEtz trainiert wird.

Es finden sich in der Literatur vor 2010 durchaus Beispiele wo remote control roboter diskutiert wurden und wo das erstellen von größeren Datasets thematisiert wurden, aber diese Ansätze galten als Randthemen ohne Relevanz für die weitere Erforschung der Künstlichen Intelligenz. Das änderte sich ab 2010 grundlegend und zwar deshalb weil man mit den bisherigen versuchen in einer Sackgasse steckte und daraufhin die Prioritäten in der Forschung neu justierte.

June 12, 2026

VLA models -- the upcoming revolution in AI

 Since the year 2023, there are Large language models (LLM) available which are soem sort of advanced chatbots. A LLM can answer question, programs a computer code and can paint an image. Even if these systems are looking powerful there is a much more advanced technology available not released yet which is a VLA model.

VLA stands for vision language action model. It can handle text in combination with robotic action which is needed to control biped robots and drones both. The user interface looks similar to a LLM because there is a text box and the user enters a prompt. The difference is, that the AI software will convert the prompt into action. An example prompt might be "walk in a circle" "bring me the red ball".

Similar to a LLM, a VLA Model works with natural language. The AI won't do anything by its own but its a text based interaction between human and machine. The innovation is, that the output of the AI isn't restricted to a text window on the monitor but the AI has access to servo motors in the reality or can control ingame characters in a videogame. Such kind of AI is available in research prototypes and was described in academic papers but its not available as commercial product for everyone.

Current LLM can simulate the behavior in parts today. Its possible to upload a JPEG image to the internet and the AI can describe the picture with words. Such kind of picture to text annotation seems a bit useless, because its obvious what is shown on the picture. So the feature is used seldom in the reality. Only in combination with actuator control of a robot it makes sense to annotate pictures. Because the robot needs to transform the camera signal into text and then take decisions in response to the information.




AI the big picture

 AI isn't new but was researched since decades by multiple researchers. They have investigated andless amount of theories and algorithms for different subjects. To get a better picture what the AI community has researched in the past, the working thesis is, that there was a transition from closed systems in the past, to open systems in the present time. This working thesis should be explained briefly.

A closed system is the natural understanding in computing. It assumes that a software runs on a computer, and the programmer has to write down the source code including the algorithm. A typical example is a model predictive control algorithm which takes a physics engine to predict future states, or a path planning algorithm like RRT which searches for the shortest path. These approaches are imitating classical computer science paradigm which are working with the same technique.

The idea of a closed AI system is to grasp the reality in mathematical terms and write a computer program which solves a mathematical optimization problem. Such kind of appraoch was common in AI history until the 1990s. The only debate was about which algorithm was prefered, for example neural network or an alpha beta pruning algorithm.

It should be mentioend, that closed systems are not powerful enought to tackle advanced probloems. Especially in the domain of robot control, the paradigm fails every time, because of the state space explosion. There is no algorithm available which can handle millions of joint configurations of a biped robot. That was the reason why some pessimistic AI researchers in the past have assumed, that its not possible to solve np hard problems in AI.

A more powerful paradigm is an open system. Early examples are motion capture systems from the 1980s which are recording the position of markers in real time. Such a system is open because it tries to capture data from the environment, here mocap data. Another example of an early open system are text adventures like Zork I which puts also a great priority on human to machine interaction. Modern open systems developed after the year 2000 are using advanced interfaces based on text and sensory data. These systems are open because the input send to the computer is the most important information. A human operator might speak "Move to north and grasp the blue box". or another human operator might demonstrate a walking pattern in a motion capture suite and the robot has to repeat the trajectory. In open systems, the man to machine interaction stands in the center of attention. Possilble technologies like certain algorithms, a certain neural network or a database is groupoed around this principle. For example, a neural network might used to deterect the mocap markers, while a SQL database is used to store the realtime data, and then a rendering algorithm might fetch the database and paint the human pose on the screen.

From a technical perspective, these algorithms are trivial and most of them were available before the 1990s. The innovation is the context in which they are used which is human to machine interaction. The existing software libraries are not used to build closed systems e.g. a genetic algorithm which tries to improve itself, but they are used to parse textual input or annotate sensor data with textual [tags].

Newspaper with AI advertisement

 

June 10, 2026

Matching game in python

The font-name needs to be adjusted according to the operating system, otherwise only a question mark is shown in the window.

import pygame
import sys
import time

# Pygame initialisieren
pygame.init()

# Fenstergröße
WIDTH, HEIGHT = 640, 480
screen = pygame.display.set_mode((WIDTH, HEIGHT))
pygame.display.set_caption("Emoji-Text-Matching")

# Farben
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
BLUE = (0, 0, 255)

# Schriftarten (mit Unicode-Unterstützung)
# font_large = pygame.font.SysFont("Segoe UI Emoji", 120)  # Für Emoji Windows
font_large = pygame.font.SysFont("Noto Color Emoji", 150)  # Für Emoji Linux
font_small = pygame.font.SysFont("Arial", 30)            # Für Text

# Emoji-Text-Paare (20 Einträge)
pairs = [
    ("🐶", "Hund"),
    ("🐱", "Katze"),
    ("🐭", "Maus"),
    ("🐹", "Hamster"),
    ("🐰", "Hase"),
    ("🦊", "Fuchs"),
    ("🐻", "Bär"),
    ("🐼", "Panda"),
    ("🐨", "Koala"),
    ("🐯", "Tiger"),
    ("🦁", "Löwe"),
    ("🐮", "Kuh"),
    ("🐷", "Schwein"),
    ("🐸", "Frosch"),
    ("🐵", "Affe"),
    ("🐒", "Affe2"),
    ("🐺", "Wolf"),
    ("🐗", "Wildschwein"),
    ("🦊", "Fuchs"),
    ("🐝", "Biene"),
    ("🐛", "Raupe"),
    ("🔪", "Messer"),
    ("🔦", "Taschenlampe"),
    
    
]

# Position für Emoji und Text (zentriert)
emoji_x, emoji_y = WIDTH // 2, HEIGHT // 3
text_x, text_y = WIDTH // 2, emoji_y + 150

# Hauptspielschleife
def main():
    clock = pygame.time.Clock()
    running = True
    current_pair_index = 0

    while running:
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                running = False

        # Hintergrund
        screen.fill(WHITE)

        # Aktuelles Paar anzeigen
        if current_pair_index < len(pairs):
            emoji, text = pairs[current_pair_index]

            # Emoji groß anzeigen
            emoji_surface = font_large.render(emoji, True, BLACK)
            emoji_rect = emoji_surface.get_rect(center=(emoji_x, emoji_y))
            screen.blit(emoji_surface, emoji_rect)

            # Text darunter
            text_surface = font_small.render(text, True, BLUE)
            text_rect = text_surface.get_rect(center=(text_x, text_y))
            screen.blit(text_surface, text_rect)

            # Nächstes Paar nach 1 Sekunde
            time.sleep(1)
            current_pair_index += 1
        else:
            # Alle Paare gezeigt: Beenden oder neu starten
            font_done = pygame.font.SysFont("Arial", 40)
            done_text = font_done.render("Alle Paare gezeigt!", True, BLACK)
            done_rect = done_text.get_rect(center=(WIDTH // 2, HEIGHT // 2))
            screen.blit(done_text, done_rect)

        # Aktualisieren des Displays
        pygame.display.flip()
        clock.tick(30)

    pygame.quit()
    sys.exit()

if __name__ == "__main__":
    main()

June 07, 2026

What is Artificial Intelligence?

 In contrast to a famous myth, there is an answer available to this question because researchers have investigated the subject for decades. The most famous and easy to understand definition aka introduction towards the subject is a computer chess player. The computer is able to decide for the next move on the board and a modern chess program can beat even a grandmaster.

Computer chess explains at the same time, what current Artificial Intelligence can't provide yet. There is a difference available between a program like gnuchess and a robot. Gnuchess is only able to play chess, while a robot has to do more complex tasks. AI research since the 1980s was devoted towards the goal to improve the skills of a computer.

A promising approach is a reward function based on grounded language. In contrast to a fixed reward function which is used in computer chess, a parametric reward function based on natural language can be modified on the fly. This allows a computer to understand instrauctions like "move to the blue box and grasp it". This command is translated into a reward signal and the computer can plan a trajectory to maximize the reward.

Let us compare computer chess with instruction following in robotics. Computer chess is based on a single fixed evaluation function which converts the current board into a reward signal e.g. 0.4. This numerical information is used by the alpha beta prunning algorithm to find the optimal action. The planner is traversing the game tree upt to 10 steps into the future and decides for an action which maximaizes the reward. This is equal to win the game.

In contrast, instruction following in robotics is offloading the reward signal to a speaker located outisde of the robot. The speaker, determines by its command what the current subgoal is in the game. A possible command might be:

1. "if the battery is empty search for the charging station"
2. "grasp the red box"
3. "bring the red box into room C".

In contrast to the game of chess which has a single goal which remains the same, a warehouse robot can have multiple goals which are acivated in a sequence. The AI makes sure, that the robot understands a goal, in a mathematical sense. Understanding means, that the robot determines the numerical reward for a textual command. For example, if the goal is "grasp the red box" the robot will receive a reward if the gripper moves towards the box and another reward for closing the gipper around the box.

The problem for the programmers and AI engineers is to encode the reward function including the natural language parser in software. A robot who understands a dozens of commands comes close to the goal of building an intelligence machine.

The purpose of a command based reward function is to transform a closed system into an open system. Open means, that the robot is communicating with its environment. The need for doing so is because the robot itself has insufficient knowledge about the task, on the other hand the human operator has much more knowledge. It makes sense to offload the planning task towards the human operator.

In chess playing AI systems from the past with a fixed evaluation function it was not possible to interact with the system during runtime. The only strategy to modify the reward was to stop the program, modify the source the source code and restart the software.

June 02, 2026

Grounding mechanism 1o1

 A DIKW pyramid consists of abstraction layers like Data, information and other. A grounding mechamism maps the items in the layer. In an example warehouse robot, the data layer cosnsits of sensor readings like GPS Coordinates, lidar distance, and battery capacity while the information layer consists of [tags] like "battery_full, north, obstacle_ahead".

The grounding mechanism generates the links between the entries. For example the lidar distcance of 10 cm is mapped to "obstacle_ahead" while the battery level of 10% is mapped to "Battery_empty".

In general, a grounding mechanism is some sort of matching game. it answers the question which situation is mapped to which description. Such a mapping is the core element of an advanced artificail intelligence.

To demonstrate why a matching game enables artificial intelligence let us assume an example. Suppose the human operator submits a command to the warehouse robot which is "move to the green area, grasp the small box on the left side, bring the box to the blue area, drop it into the shelf, then recharge your battery".

If the grounding mechanism is missing or was deactated, the command is interpreted as string with 144 characters. It wasn't formulated in the C/C++ programming langauge but it can be stored only in the main memory.

Suppose the robot has a builtin grounding mechanism, than its possible to parse the sentence word by word. The word "green" is matching to a certain RGB value, the word "box" is mapped to a certain shape in the camera, the word "shelf" is mapped to a picture of the shelf and so on. The parsing algorithm fetches a word from the sentences, and takes a lookup into the database to identify the item from the data layer of the DIKW pyramid. Understanding a sentence from a robots perspective has to do with matching items from the information layer to the data layer.

June 01, 2026

Symbol grounding problem as answer to np hard algorithms

 Before its possible to describe grounded language there is a need to explain who artificial intelligence was imagined until the year 1990. It was treated similar to computer programming in the sense that there is a CPU which executes a program and its up to the programmer to make the algorithm as intelligent as possible. Artificial intelligence was thought as a very advanced computer programmed which is executed by a computer.

In other terms, the computer was seen as a problem solving machine and the only detail problem was which sort of algorithm is needed to solve a certain problem. For example motion planning in robotics was solved with motion planning algorithms while computer chess was solved with alpha beta prunning algorithms. Most of these AI related algorithms were designed as search algorithms. The computer was used to traverse the state space of the domain and this allowed the computer to find the optimal action.

The symbol grounding problem formulated by Stevan Harnad questions this algorithm oriented paradigm. This might explain why even today grounded language is a niche topic within computer science. Because computer science and algorithms were often treated as the same thing, it was outside of the scope how to program a computer without an algorithm.

Let us listen closely how Harnad, Brooks and Steels are arguing about grounded language. The core element is the sensory perception of a robot. The assumption is that the perception is transmitted to the computer. There is no need to calculate something but the focus on the data transfer. A light sensor detects light and the information from the sensor is send over a cable to the computer. The symbol grouding problem doesn't focus on the computer itself, but on the cable between a sensor and a computer, very similar to a computer network. Computer networks are different from a turing machine, they are never running algorithms, but a computer network communicates data often organized in a protocol layer.

The paradigm shift from algorithm centric computers towards protocol oriented data transmission is the core element of the symbol grounding problem. Artificial Intelligence isn't explained as processing or program executation, but Artificical Intelligence is imaged as the air gap between two hosts.

Let us compare the hardware. In classical algorithm oriented AI the basic building block is a central processing unit, which can be a 32bit CPU. The CPU is built with transistors on a chip and gets controlled by Assembly language. In contrast, the symbol grounding problem assumes that there is a Cat5 copper cable which delivers packets. Its up to the network engineer to define the protocol of the packets.

The paradigm shift can be explained for np hard problems. NP hard is a certain category of problems related to artificial intelligence which can't be solved with a computer. Nearly all robotics motion planning problems like the piano movers problem or model predictive control are np hard. The term np hard is referencing to the runtime of an algorithm executed on a cpu. In other words, even a modern 64bit CPU can't solve these problems because the hardware is too slow.

The holy grail in computer science is how to solve np hard problems. The answer was given by Stevan Harnad in his famous 1990 paper. He didn't mentioned np hard problems, but its possible to solve np hard problem with grounded language. Instead of using a CPU to calculate a mathematical problem, a copper cable is used to solve a data transmission problem. This new perspective is powerful enought to solve motion planning problems in robotics.