February 22, 2026

Minsky frames as communiation tool

 

A minsky frame is a list of key/value pairs as text overlay in a GUI window. Its not intended as an internal data structure within a robot but its GUI gadget which displays information about the game on the screen.

For the example of an intersection simulator the minsky frame was realized with the pygame command:

screen.blit(txt, (35, y))

Which draws a text string to the screen, for example the information "exit_target: WEST". A minsky frame is some sort of form which determines which aspects of the reality are important. The computer determines the value for each item and shows the result on the screen. This allows to solve the symbol grounding problem because the shown text overlay translates the data layer of the DIKW pyramid into the information of the DIKW pyramid.

Making robots more chatty with minsky frames

Autonomous robots in the past were mostly silent systems which aren't talking but processing information. This makes it hard to debug the AI software.

The screenshot shows an alternative which is a very chatty maze robot. His task is to move around and recharge its battery when its empty. The AI brain consists of mulitiple Minsky frames with key/value information. Technically it was realized as python dictionary shown in text overlay windows.

The surprising situation is, that even such a minimal robot game consists of huge amount of information. There are raw sensor data itself like the position but there are also semantic information like the location in the map and the planned actions.

Creating a Minsky frame itself is not very complicated because its a normal python dictionary. What makes the datastructure powerful is, that frames are written information. They are stored in a database and this allows to add information and translate the existing Minsky frames into new information. For example the planned actions of the robot can only be determined if the existing frames are available which are analyzed.

In other word, the AI isn't a list of algorithms, but the AI is a database distributed over hierarchical key/value data.

February 20, 2026

Minimal Grounded language

The symbol grounding problem and especially recent vision language models are very complicated to explain. What is missing is a simplified introduction which should given in the following blogpost. The core element of grounded language is a multimodal dictionary, see the picture before. There is one column with pictures and another column with textual annotation. Both columns are connected to each other which allows to translate back and forth between the modalities.

The availability of such a dictionary allows the computer to understand instructions and also the computer can describe the content of a picture. Let me give an example. Suppose the human operator types in "circle left". The computer starts a look up request in the database and retrieves the correct pictograms showing a circle and the left-arrow. The symbols are shown on the screen and the human operator can enter the next command.

Such a pipeline doesn't look very impressive but it can be scaled up dramatically. suppose the pictograms are replaced with motion capture trajectories, there is one trajectory for "stand up" and another one for "walking to the left". This allows a human operator to control a humanoid robot with words. He types in a word, and the lookup in the database with retrieve the correct trajectory which gets executed on the robot. Such a system is known as vision language action model and is the core element of advanced robotics.

Let us go back to the initial example with a pictogram. The task can be summarized as a translation problem between words and symbols. A word is common string like "[l][e][f][t]". Such a string can be generated by keystrokes on a computer keyboard. In contrast, the matching symbol is a picture which is a bit harder to generate. Pictures are usually drawn in a vector graphics program. The trick is to see both information as connected to each other. The matching is called grounding and allows to simplify the communication.


It should be mentioned that from a computer science perspective, the grounding problem can be called boring or trivial. Storing the pictogram including the textual annotation into a computer is a solved problem. There are many ways available for doing so. Its a simple database problem which can be implemented in python in under 100 lines of code. 

The reason why grounded language is at the same task and advanced task is because it is strongly connected with linguistics. So its an interdisciplinary approach between AI, computer science, cognitive science and linguistics. This makes grounded language to a very complex subject.

February 16, 2026

Gründe gegen Linux auf dem Desktop

 Innerhalb der Linux Community gibt es ein breites Unverständnis gegenüber Windows User. Linux wird als das technisch bessere System definiert und folglich wird das Winbdows Ecosystem verspottet und es wird unterstellt dass Windows ein Auslaufmodell wäre.

Anstatt an diesen Gedankengang anzudocken und gebetsmühlenartig zu wiederholen was die Vorteil von Linux auf dem Desktop sind, sollen an dieser Stelle einmal die nachteile von Linux genannt werden mit dem ziel dass Windows Benutzer gestärkt werden.

Der wichtigste Grund gegen Linux ist, dass dort die beliebte Datenkbank-Anwendung MS Access nicht lauffähig ist, es gibt ferner kein Opensource pendant was dem Funktionsumfang nahe kommt. Lediglich für MS Word und MS Excel gibt es etablierte Linux Alternativen, für die Desktop_datenbank fehlte bisher die nötige Manpower um ein Open Source Projekt durchzuführen.

Eine Desktop Datenbank wie MS Access ist ferne nicht irgendeine Büroanwendung ähnlich wie ein texteditor oder ein Malprogramm, sondern mit besagter Software kann man ohne Programmieren zu müssen leistungsfähige Frontend und Backend EDV Anwendungen erstellen. Datenbanken mögen für Physiker und Mathematiker entbehrlich sein die eher Programmiersprachen und Mathematik-Software einsetzen, jedoch sind Datenbanken im Geschäftsumfeld die Brot und Butter Anwendung. Der Computereinsatz von Unternehmen ist ausnahmslos datenbank orientiert. Ohne Übertreibung kann man sagen, dass MS Access die Killerapplikation für Compüuter im Büroumfeld darstellt, fehlt diese Applikation sind alle weiteren Vorteile unwichtig.

Linux hat jedoch noch weitere Nachteile. Zunächst einmal ist es auf Consumer-PC nicht vorinstalliert. Wer einen neuen PC im Handel kauft wird dort lediglich Windows 11 vorinstalliert vorfinden. Technisch kann man das Problem natürlich mit einem selbst erstellten USB Bootstick lösen, jedoch wird der Umsteiger schnell erkennen, dass es im PC Fachgeschäft für Linux generell keine Software verfügbar ist. Sämtliche kommerzielle Software von Softwarefirmen die an Privat und LBusinesskunden verkauft wird, wurde dezidiert für das MS Windows Betriebssystem entwickelt.

es gab früher einmal auch kommerzielle Linux software die im Fachladen als Package verkauft wurde, jedoch hat sich das konzept nie durchsetzen können. Die PC Welt ist deshalb zweigeteilt: es gibt einmal kommerzielle Software entwickelt von Firmen für das Windows system und dann gibt es noch Open Source software die es nur online im Internet gibt, die aber nicht kommerziell vermarktet wird. Dadurch stehen normale 'Anwender die auf Linux setzen, plötzlich allein da. Sie müssen sich das nötige Fachwissen mühsam anlesen, sie müssen die Software online herunterladen und wenn es dabei Probleme gibt erhalten sie in Foren den Rat doch einfach auf eine andere Linux Distribution zu wechseln, weil die angeblich viel besser wäre.

Für den Durchschnittsanwender, der einen PC mit vorinstallierten Windows verwendet gibt es keinen Grund auf Linux zu wechseln. Er wird dort nur Nachteile erleben, und wegen der steilen Lernkurve an einfachsten Aufgaben scheitern wie der Installation eines Spieles oder dem Öffnen eines Word Dokumentes. MS Windows ist in sämtlichen Punkten überlegen und bildet unverändert den Industriestandard für Desktop PC sowohl bei Privatanwendern als auch in der Geschäftswelt.

The information layer in the DIKW pyramid

The lowest layer in the DIKW pyramid is the data layer which can be desribed easily. There are raw sensor data like distance, temperature, gps coordinates which are stored in a numerical format. The next layer in the pyramid, the information layer, is harder to describe. A working thesis is, that the information layer consists of [tags].

For the example of a warehouse robot, the tag cloud would be: [roomA, roomB, roomC, shelfNorth, shelfsouth, shelf1, shelf2, obstacle, battery, chargingstation, barcode, path, left, right, speed, direction, batteryempty, order]

Of course the tag list is not complete, there are additional tags available but for reason of simplication this might be a starting point. These tags are providing context because after selecting one tag, possible alternative tags are not activated. For example, the goal for the robot might be [roomB] but not [roomA, roomC]. The robot might rotate to [left] but not to [right]. So the context of a tag are always the tag which might be possible but are not activated at the moment.

All the tags are creating a semantic network. In contrast to a full blown ontology or AI frames, a tag based information is more minimalist. Every tag can be activated or not similiar to the tags in a blogging post for annotating a document.

The interesting situation is, that there is an intersection available between low level sensor data and mid level tagging cloud. For example:

- gps sensor -> [roomA]
- gps sensor -> [direction]
- distance sensor -> [obstacle]

For desribing the robot's behavior both layers (data and information) are important. The robot needs to log the numerical raw sensor data and also the robot needs to annotate the current sensory perception with semantic information.

What we can say for sure is, that tagging information doesn't belong to the lowest data layer. A sensor like a gps sensor has no builtin tagging mechanism. The sensor doesn't know the position of a certain shelf, or doesn't know if the robot is in roomA or in roomB. What the gps sensor knows instead are precise x/y coordinates. The reason is, that the sensor hardware is able to generate such data. Its up to a higher instance in the DIKW pyramid to process these data.

February 15, 2026

Quiz: The DIKW Pyramid (Data to Information)

The symbol grounding problem can be explained with a DIKW pyramid. The following quiz has the domain of a warehouse robot.

Instructions: Match the raw sensor output (Data) to the correct operational meaning (Information).

1. A Time-of-Flight (ToF) sensor returns the integer 20. The system context defines the unit as centimeters. What is the information?

    A) The robot's battery is at 20%.
    B) An object is detected 20 cm in front of the sensor.
    C) The robot has picked up 20 items.
    D) The robot is moving at 20 km/h.

2. The internal IMU (Inertial Measurement Unit) registers a sudden spike of 9.8 m/s² on the X-axis while the robot was supposed to be stationary. What is the information?

    A) The robot is successfully charging.
    B) The robot has reached its top speed.
    C) The robot has likely been struck or tilted unexpectedly.
    D) The floor is perfectly level.

3. A barcode scanner on the gripper returns the string SKU-9921-RED. What is the information?

    A) The gripper is currently empty.
    B) The robot needs to be rebooted.
    C) The item currently held is identified as a "Red Small Widget."
    D) The ambient light in the warehouse is too red.

4. The wheel encoders report 500 rotations, while the visual odometry (camera) reports 0 meters of forward progress. What is the information?

    A) The robot is moving faster than expected.
    B) The wheels are slipping on a slick surface (e.g., oil or plastic).
    C) The robot has arrived at the loading dock.
    D) The camera lens is dirty.

5. A thermistor near the motor controller reads 95°C. The operating limit is 80°C. What is the information?

    A) The motor is warming up to optimal temperature.
    B) The warehouse heating system is turned on.
    C) The motor controller is overheating and at risk of failure.
    D) The robot is in a refrigerated zone.

6. A pressure-sensitive safety bumper sends a High/1 logic signal to the CPU. What is the information?

    A) The robot is clear of all obstacles.
    B) The robot has made physical contact with an object or person.
    C) The battery voltage is stable.
    D) A new software update is available.

7. The battery management system (BMS) reports a voltage of 19.2V on a 24V rated system. What is the information?

    A) The battery is fully charged.
    B) the battery is at a "Low" state and requires docking soon.
    C) The robot is currently plugged into a wall outlet.
    D) The sensors are calibrated correctly.

8. An acoustic sensor detects a frequency of 110 dB during a lifting sequence. What is the information?

    A) The lifting mechanism is operating silently.
    B) The warehouse music is too loud.
    C) There is an abnormally loud grinding noise in the lift gears.
    D) The item being lifted is very light.

Correct answers

    1. B (The number 20 + unit cm = distance information)
    2. C (Sudden acceleration on a stationary axis = impact/tilt information)
    3. C (String data + database lookup = object identity information)
    4. B (Rotation data vs. no movement data = traction loss information)
    5. C (Raw temperature > threshold = critical status information)
    6. B (Binary signal from a bumper = collision detection information)
    7. B (Voltage level compared to system rating = energy status information)
    8. C (Decibel level + operational context = mechanical fault information)

February 14, 2026

Language as ghost in the machine

The term "Ghost in the machine" is usually referencing to artificial intelligence which allows a robot to do useful things. The body is the robot's hardware and therefor the software is soul realized in software. With such an understanding, Artificial Intelligence is an advanced computer program based on AI related algorithms.

But what is if this working thesis is wrong? The assumption is, that AI can't be realized in hardware and also not in software. But Artificial intelligence is similar to a ghost very hard to identify. Its basically natural language. In other words, the ghost is an English dictionary. Its not stored in the software itself, because language is a communication pattern used as in between technology.

Natural language will become only visible if a human speaks with a robot. The human formulates a request like "move forward" and the robot responds to this request. Therefor, the AI isn't stored in the robot itself, but its located in air between human and robot. This would explain why it has much in common with a ghost which also has no measurable location in the reality. But a ghost is located beyond, which is the environment of the reality, or some sort of hyperspace.

Let us try to describe language from a more scientific perspective. A statement like "grasp the apple, please" has no physical size. Words are not part of the visible reality, but they are abstract symbols located in the oral space. Despite this missing physical borders, language is part of the reality because language is used for many purposes. In a mathematical sense, language is a communication technology used in a sender to receiver interaction. Such a protocol has much in common with a ghost like behavior. If language gets submitted over radio waves it has some similarity to a supernatural phenomena.

What makes language interesting for artificial intelligence is, that without language a robot is not able to think. Without language, robots are reduced to a pocket calculator which can execute an algorithms but isn't understanding the meaning of objects in the reality like a table, an apple or a plate. The ability to parse natural language is equal to implement artificial intelligence.

February 11, 2026

Playing pong videogame with a perception buffer

 

In addition to the previous post here is another example with a working AI based on a buffer. The game has two modes: normal mode which runs the simulation and a pause mode which shows AI information including the perception buffer, the action buffer and the predicted trajectory. The text box shows the known information from the game engine which are the ball position, its velocity and other information. the action buffer stores information what to do next. This information is submitted to the paddle.

Because of the simplicity of the pong videogame, the AI master the challenge with ease, it will move the paddle towards the correct position.

From a technical perspective the buffer was realized with a python dict for storing the information in a key/value syntax. Creating such a dictionary and showing the content on the screen is very simple. The innovation has to do with the assumption that such a buffer modulates the communication process. The AI brain isn't imagined as a sophisticated algorithm, but its a database which holds information as natural language. This will generate multiple subtasks like a) how to convert the game state into a perception buffer b) how to translate the perception buffer into the next action and c) how to submit the content of the action buffer back to the game engine.