Grounded language can be described as sensor data tagging. It connects the internal raw sensory data of a robot with the external semantic tagging system. The linking is realized in a DIKW pyramid and improves man to machine communication. Such a communication system allows the robot to offload the intelligence to a human.
Here is an example. Suppose a warehouse robot stands in front of an obstacle. Because the robot's software isn't able to solve the situation, the robot asks a human operator what to do next. With the help of grounded language the output of the robot is: "obstacle: near, battery: 85%, question: What to do?". The human operator reads the textual message and takes a decision which is send back to the robot.
There are multiple techniques available how to implement such a system in software, for example with a handcoded language parser, or with a neural network. The shared similarity is, that all these attempts are based on natural language and put a high emphasizes on man to machine communication.
The term grounding is referencing to multiple situation:
a) its a link between sensor data and textual annotation
b) its a link between the internal robot structure and the external environment
c) its a link between low level and high level problem description
In more colloquial terms, grounded language means to use English for teleoperation of a robot. This principle seems not very impressive because it was demonstrated in science fiction movies multiple times in the past. The innovation is, that there is no alternative available to realize artificial intelligence. That means all advanced robots are built as teleoperated machine who understands English language.
Robotics and Artificial Intelligence
May 23, 2026
Grounded language in a nutshell
Lessons learned from Douglas Lenat's Cyc
During the late 1980s the Cyc project was a large scale AI project. The promise was to create a database with handcrafted Lisp rules which is able to reason about the world. The attempt has failed but that is no problem because it its possible to analyze the reason why.
From today's perspective Cyc was an early attempt to create a dataset. A dataset is a .csv file but doesn't contain of computer code. Datasets are storing numbers and text. During the 1980s it was unknown how to create large scale datasets and Cyc had some builtin mistakes:
a) there was no word2vec algorithm which allows to convert the textual information into numerical representation
b) Cyc was encoded with rules but not with question answer pairs
A modern dataset which is superior over cyc would solve these mistakes. A common dataset used for training neural networks contains of a simple Q&A structure like "What is the capital of france? -- Paris". and it would use a word embeddings algorithm to project the information into a numerical space which can be parsed by neural networks.
The Cyc knowledge base was a combination of Lisp software and textual information. It was a hybrid of computer code and a dataset. Such kind of knowledge base was replaced by data only datasets which have become popular since the deep learning boom. In a data only dataset there is no computer code but only data itself which can be text or images. The computer code which is searching in the data is externalized in a deep learning library.
May 21, 2026
A review of bottom up robotics
In the late 1980s there was a fundamental paradigm shift available in the domain of Artificial Intelligence, called bottum up robotics or subsumption architecture. It wasn't a new algorithm but at first it was a criticism of AI in the past. Bottom up robotics is mostly the description that program controlled top down robotics until the year 1990 has failed. Instead Brooks recommended to build simple sensor driven robots in the style of William Walter's turtle robot in the 1940s.
In a single sentence, Brooks argued, that its unclear how to program robots and instead of trying it harder, the answer is to give up and build instead Analog beam robots with a single sensor and a single motor. Of course, such a robot doesn't make sense because the goal is to build high complex machines which can do practical tasks and not to build a light following bug which can't do anything.
Despite of this step backward, bottom up robotics had become a great success. Many other researchers have agreed to Brooks, and similar architectures like Tilden's BEAM robots were popular.
Let us describe bottom up robotics from a birds eye perspective. These robots or artificial bugs are mostly controlled by its environment and by a random generator but not by an internal program. This paradigm shift was the real novelty of Brooks. It introduced a concept in which the former program oriented approach in robotics was dismissed in favor of external control.
Brooks identified correctly what sort of technology can't be realized. Its not possible to program a robot similar to a computer program. It doesn't make sense to write a C program and compile it for a microcontroller which is doing something with a robot because such a C program will provide a reality gap to the environment. A high complex task will require a high complex computer program and nobody knows who to write down the source code.
Let me give an example. Before the advent of bottom up robotics, the shared assumption in artificial intelligence was, that a robot who should grasp an objects needs to be programmed first. There are 5000 lines of code which are planning the grasping, solving the mathematical equation to determine the trajectory of the gripper and monitor if the robot is successful. Its impossible to write and improve such a C program.
May 18, 2026
The power of head up displays
Head up displays are common special effects in scifi movies. Since the 1980s lots of films have demonstrated these visual effects. Most of the audience thinks, that the head up display isn't artificial intelligence but its only the artist representation of possible future robotics.
Its a bit surprising to explain that a head up display is the fundamental building block for artifcial intelligence because they are showing grounded language. The typical head up display is formatted in a key/value syntax, similar to a json file. Example for a warehouse robot:
location: cell B, north
movement: east
speed: 4 km/h
gripper: empty
obstacle: no
target: cell A
battery: 81%
All the important information can be shown in this syntax. The key/value format converts the camera picture into a text adventure game. A parser can analyze the textual information and decide what the robot should do next. For example, if the battery is below 20% the robot needs to find the charging station, And if there is an obstacle ahead, the robot needs to stop.
So we can say, that advanced robots aren't controlled by a AI algorithm but by the head up display. The information are the input for the decision making system, the head up display consists of the state space of a robot. If the robot decides for the wrong action, sometihng is wrong with information in the head up display.
May 14, 2026
The upcoming Claude mythos LLM
There are rumors available about a new large language model called "Claude mythos" which wasn't released yet. Its not very hard to describe its potential features because existing large language models have a lots of disadvantages.
Chatgpt and co are able to generate source code for example in Python and C but they are not able to execute in a virtual environment. The human user will notice this restriction because the LLM generated code contains sometimes smaller errors. For example the python interpreter might report that in a line 30 something is wrong. The current situation in may 2026 is, that the user has to submit the error message from python to the chatgpt LLM and then the chatbot will create the improved version which might contain another error. It takes a lot of time to produce a runnable software with such a feedback loop.
Suppose a large language model has an internal python interpret which can execute sourcecode and improve it. This would lower the needed feedback loops with a human and allows the LLM to generate error free programs in the frist attempt.
In general its about an environment to test software or test the actions of a robot. The assumption is that Claude mythos will have such a built in environment which improves AI based software engineering drastically.
The proposed abilities of Claude mythos to find bugs in existing software project is perhaps working with the same method. Before its possible to find a bug and fix it, there is a need to simulate the software in a simulator. Such a simulator is used by human programmers since years, its mostly a Gnu compiler which converts c code into binary code plus a virtual machine which is qemu to run the software. Every possible bugfix is compiled first to verify that there is no error in the code and then the binary file is run in a simulator to verify if the software is fixing the problem. The chance is high is that claude mython works with a similar principle.
This would allow a computer not only to generate source code, but also determine the outcome of the generated code. Such an LLM would be more useful than existing LLM which do not have such features.
Das Symbol grounding problem an einem praktischen Beispiel
Grounded language ist ein interdisziplinäres Problem was ein umfangreiches Fachwissen in sehr unterschiedlichen Disziplinen wie Informatik, Linguistik und Robotersteuerung erfordert. Es ist daher nötig, die Thematik zu vereinfachen anhand eines praktischen Beispiels. Ein Anfang ist eine Landkarte auf der ein Mauscursor bewegt wird. Der nutzer kann die mauf einen belieben Punkt bewegen z.B. auf einen roten Kreis oder ein gelbes Quadrat. Das Computerprogram zeigt für die Mauspostition die [tags] an, z.B. "[kreis] [grün]", oder "[rechteck] [klein]".
zumindest für das MInibeispiel mit der Landkarte auf dem geoemtrische Objekte zu sehen sind, ist damit das symbol grounding problem gelöst.
Ähnlich wie bei einer perspektivischen Darstellung in der Malerei gilt es also die REalität auf ein koordinatensystem abzubilden. Bei grounding problem bestehen die koordinaten aus einer [tag] liste. Der User zeigt auf einen punkt z.B. (100,30) und der Computer bestimmt die Tags für das Objekt an diesem Punkt.
Obwohl die technische Umsetzung leicht ist, kann über grounded language ein erstaunlich leistungsfähiges KI System erstellt werden. Ab dem moment wo der Computer tags ausgeben und parsen kann ist darüber eine Kommunikation möglich. Ein Beispiel:
Angenommen die beschriebene semantische Kamera wurde für ein Jump'bn'Run Videospiel implementiert, das heißt die Software vermag anhand der Tilemap sagen, ob der Mousecursor auf einem Abgrund, einem coin, einem powerup, einem Gegner oder auf einer Plattform steht. Dann kann diese Information in einer Regel referenziert werden, wie "gehe bis zum Abgrund und halte an, dann springe darüber und laufe bis zum Coin". Diese komplexe Befehlsfege referenziert auf erkannte Tags in dem Computerspiel, der parser kann dies auswerten und versteht was der Benutzer möchte. Nicth weil ein hochkomplexer Algorithmus im Hintergrund arbeitet, sondern weil ein mensch maschine interface existiert auf das man verweisen kann.
Der sourcecode und der screenshot zeigen nicht das geometrie problem sondern ein robot in a maze spiel bei dem ein roboter items sammeln muss. Es gibt in der Statuszeile eine semantische Event Erkennung. Der Roboter bewegt sich in der Karte und parallel dazu wird in der Textbox die aktuelle Situaton beschrieben, ein wenig so wie einem frühen Textadventure. Über besagte Statuszeile wird der game state des robtoers definiert und zwar im linguistischen Raum und nicht im geometrisch mathematischen Raum.
import pygame
import sys
import random
# Initialize Pygame
pygame.init()
pygame.font.init()
# --- Configuration Constants ---
GRID_SIZE = 40 # Pixels per cell
GRID_COLS = 20
GRID_ROWS = 12
# Textbox dimensions (40 chars wide, 4 lines high roughly translates to this)
TEXTBOX_HEIGHT = 100
SCREEN_WIDTH = GRID_COLS * GRID_SIZE
SCREEN_HEIGHT = (GRID_ROWS * GRID_SIZE) + TEXTBOX_HEIGHT
# Colors (RGB)
COLOR_STREET = (240, 240, 240)
COLOR_HOUSE = (70, 130, 180)
COLOR_ROBOT = (220, 50, 50)
COLOR_TRASH = (40, 180, 99)
COLOR_TEXTBOX_BG = (30, 30, 30)
COLOR_TEXT = (255, 255, 255)
COLOR_GRID = (210, 210, 210)
# --- Event Log System ---
# The 12 grounded language events:
# 1. "System initialized. Roomba ready."
# 2. "Moved North."
# 3. "Moved South."
# 4. "Moved East."
# 5. "Moved West."
# 6. "Obstacle detected at North."
# 7. "Obstacle detected at South."
# 8. "Obstacle detected at East."
# 9. "Obstacle detected at West."
# 10. "Grid boundary reached."
# 11. "Trash item successfully collected!"
# 12. "Area clear. No trash nearby."
event_logs = ["System initialized. Roomba ready.", "", "", ""]
def log_event(message):
"""Adds a new event to the log, keeping only the last 4 events."""
global event_logs
if event_logs[-1] != message: # Avoid spamming identical consecutive logs
event_logs.append(message)
if len(event_logs) > 4:
event_logs.pop(0)
# --- Map & Environment Setup ---
# 0 = Street (Pathway), 1 = House (Obstacle)
maze = [[0 for _ in range(GRID_COLS)] for _ in range(GRID_ROWS)]
# Generate mock "blocks" of houses to look like a street map
random.seed(42) # Seed for consistent map generation
for r in range(1, GRID_ROWS - 1, 3):
for c in range(1, GRID_COLS - 1, 4):
# Create a 2x2 or 2x3 house block
block_w = random.randint(2, 3)
block_h = 2
for bh in range(block_h):
for bw in range(block_w):
if r + bh < GRID_ROWS - 1 and c + bw < GRID_COLS - 1:
maze[r + bh][c + bw] = 1
# Spawn Trash Items
trash_positions = set()
while len(trash_positions) < 10:
tr = random.randint(0, GRID_ROWS - 1)
tc = random.randint(0, GRID_COLS - 1)
if maze[tr][tc] == 0: # Must be on a street
trash_positions.add((tc, tr))
# Spawn Robot
robot_x, robot_y = 0, 0
while maze[robot_y][robot_x] != 0:
robot_x = random.randint(0, GRID_COLS - 1)
robot_y = random.randint(0, GRID_ROWS - 1)
# --- Simulation Setup ---
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Robot Street Simulator")
clock = pygame.time.Clock()
font = pygame.font.SysFont("Courier", 18) # Monospace font for predictable char width
def check_surroundings(rx, ry):
"""Scans adjacent cells to log nearby obstacles."""
# North
if ry - 1 < 0: pass
elif maze[ry - 1][rx] == 1: log_event("Obstacle detected at North.")
# South
if ry + 1 >= GRID_ROWS: pass
elif maze[ry + 1][rx] == 1: log_event("Obstacle detected at South.")
# West
if rx - 1 < 0: pass
elif maze[ry][rx - 1] == 1: log_event("Obstacle detected at West.")
# East
if rx + 1 >= GRID_COLS: pass
elif maze[ry][rx + 1] == 1: log_event("Obstacle detected at East.")
# Initial scan
check_surroundings(robot_x, robot_y)
# --- Main Loop ---
# --- Main Loop ---
running = True
while running:
for event in pygame.event.get():
# FIX: Changed event.get_type() to event.type
if event.type == pygame.QUIT:
running = False
# FIX: Changed event.get_type() to event.type
elif event.type == pygame.KEYDOWN:
dx, dy = 0, 0
move_dir = ""
if event.key == pygame.K_UP:
dy = -1
move_dir = "North"
elif event.key == pygame.K_DOWN:
dy = 1
move_dir = "South"
elif event.key == pygame.K_LEFT:
dx = -1
move_dir = "West"
elif event.key == pygame.K_RIGHT:
dx = 1
move_dir = "East"
if dx != 0 or dy != 0:
new_x = robot_x + dx
new_y = robot_y + dy
# Check Grid Boundary
if not (0 <= new_x < GRID_COLS and 0 <= new_y < GRID_ROWS):
log_event("Grid boundary reached.")
# Check House Obstacle Collision
elif maze[new_y][new_x] == 1:
log_event(f"Obstacle detected at {move_dir}.")
# Move Valid
else:
robot_x = new_x
robot_y = new_y
log_event(f"Moved {move_dir}.")
# Check Trash Collection
if (robot_x, robot_y) in trash_positions:
trash_positions.remove((robot_x, robot_y))
log_event("Trash item successfully collected!")
# Scan environment post-movement
check_surroundings(robot_x, robot_y)
# Check if all clear
if not trash_positions:
log_event("Area clear. No trash nearby.")
# --- Drawing Environment ---
screen.fill(COLOR_STREET)
# Draw Grid and Houses
for r in range(GRID_ROWS):
for c in range(GRID_COLS):
rect = pygame.Rect(c * GRID_SIZE, r * GRID_SIZE, GRID_SIZE, GRID_SIZE)
if maze[r][c] == 1:
pygame.draw.rect(screen, COLOR_HOUSE, rect)
pygame.draw.rect(screen, COLOR_GRID, rect, 1)
# Draw Trash Items
for (tx, ty) in trash_positions:
trash_rect = pygame.Rect(tx * GRID_SIZE + 10, ty * GRID_SIZE + 10, GRID_SIZE - 20, GRID_SIZE - 20)
pygame.draw.rect(screen, COLOR_TRASH, trash_rect, border_radius=3)
# Draw Robot
robot_rect = pygame.Rect(robot_x * GRID_SIZE + 6, robot_y * GRID_SIZE + 6, GRID_SIZE - 12, GRID_SIZE - 12)
pygame.draw.ellipse(screen, COLOR_ROBOT, robot_rect)
# --- Drawing Grounded Language Textbox ---
# Draw Textbox background container
textbox_rect = pygame.Rect(0, GRID_ROWS * GRID_SIZE, SCREEN_WIDTH, TEXTBOX_HEIGHT)
pygame.draw.rect(screen, COLOR_TEXTBOX_BG, textbox_rect)
pygame.draw.rect(screen, COLOR_TEXT, textbox_rect, 2) # Border
# Render the 4 lines of text
for idx, log in enumerate(event_logs):
# Clip string to 40 characters maximum to respect specification constraints
truncated_log = log[:40]
text_surface = font.render(truncated_log, True, COLOR_TEXT)
screen.blit(text_surface, (15, (GRID_ROWS * GRID_SIZE) + 10 + (idx * 20)))
pygame.display.flip()
clock.tick(30)
pygame.quit()
sys.exit()
May 13, 2026
Wie Computer das denken lernen
In der Geschichte der Künstlichen Intelligenz gab es zahlreiche versuche einer Maschine das Denken beizubringen. Zunächst wurde Denken über Algorithmen simuliert. Die Vorstellung war, dass eine denkende Maschine ein Programm abarbeitet und das Ergebnis dann eine Entscheidung ist z.B. den Roboter nach norden zu steuern.
Theoretisch mag das Konzept sinnvoll klingen es scheitert jedoch sobald man versucht einen solchen Algorithmus zu programmieren. Es ist nicht klar, wie man genau anfängt und wie vorhandene Algorithmen verbessert werden können. In der Summe ist der Versuch Denken als Algorithmenausführung zu definieren gescheitert.
Eine neuere und erfolgversprechende Methode dennoch dem Computer das Denken zu lehren ist die natürliche Sprache. Die Annahme lautet dass Denken identisch ist mit Sprachverarbeitung. Um Sprache von einem Computer verbeiten zu können ist ein interaktiver Ansatz möglich. Man sendet an den Roboter z.B. einen Satz wie "fahre nach norden" und der Computer übersetzt den Satz in eine Handlung. Oder man sendet an den Computer ein Wort wie "Apfel" und der Computer zeigt daraufhin das passende Bild mit dem Obst.
Wenn man die interaktion mit dem Computer in datensätzen dauerhaft speichert und darüber neuronale Netze trainiert erhält man moderne LLM Chatbots wie sie seit 2023 entwickelt werden. Diese kommen menschlichen Denken sehr nahe.
Das besondere an der sprachbasierten Künstlichen Intelligenz ist, dass sie nicht länger von Algorithmen definiert ist. ZWar beinhalten Large language modelle auch eine Softwarekomponente doch der weitaus wichtigere Teil ist die .csv Datei worin Frage / Antwort Paare gespeichert sind.

