Robotics and Artificial Intelligence: December 2025

December 09, 2025

Computation vs communication in robotics

In the past, robots were designed with algorithms as the core element. The algorithm was encapsulated inside the robots control software and was responsible for image recognition and action planning. The problem was that programming such a robot software is very difficult and the resulting autonomous robot can't do complex tasks.

A seldom discussed powerful alternative is based on a communication paradigm. The principle was introduced by Stevan Harnad in an influential paper from 1990. The core element of the symbol grounding problem is a language game which was described by Ludwig Wittgenstein much earlier. Language games are matching natural language with actions in the reality. For example a human operator might say "move until the wall and stop". If the robot is executing the command, he has won the game and demonstrated a good understanding of English.

The main reason why communication in robotics is only seldom discussed is because it differs from classical algorithm centric understanding. Algorithms are the main subject in computer science, they are described with mathematical terms and implemented in computer software. There are endless amount of algorithms available including planning algorithms and model predictive control algorithms. The problem is, that all the efforts are meaningless under the new paradigm of communication based robotics. Communication is defined as language based communication with a sender receiver model. The inner working of a human to robot interaction can't be described with computer science anymore but it has to do with semiotics and Linguistics. Both inventors, Wittgenstein and Harnad, have its roots in linguistics and philosophy but not in mathematics.

A communication based robotics paradigm has many advantages over former algorithm centric robotics. It reduces the complexity of robot programming with the help of natural language. Instead of programming endless amount of codelines in a computer language, the core element of a communication based robotics is a natural language like English which provides a vocabulary for describing the world. There are nouns for referencing to objects like "wall, junction" and there are verbs for describing activities like "move, stop, rotate". These words are combined into sentences which allows to formulate more complex tasks.

The only technical challenge is, to program a parser which understands the mini language. The robot receives a command from a human and translates the command into action. Of course, a parser is programmed with an algorithm inside a computer program, so there is a need to write software.

December 07, 2025

AI pose estimation

In comparison to large language models, a pose estimation on a computer sounds not very interesting. The algorithm detects that the person on the screen stands on one leg or lfts the arm. Such kind of algorithm can be realized with a small neural network and even with a hand coded software routine. On the other hand there are some arguments available why AI pose estimatiuon is an underestimated technology and should be described in detail.

Its correct, that AI pose estimation itself looks a bit boring, but pose estimation is the required step for humanoid robotics. The same algorithm which converts a scene into text can also convert text into a scene. This is called an AI animation system, the user enters a description like "show the hand with 5 fingers" and the character on the screen is doing so. Such kind of text to animation system is only one step before the realization of humanoid robotics. A humanoid robot can do the same task in the physical reality. The human operator may say "stay on left leg" and the 2 meter large robot is doing so.

In all these cases the principle has to do with translation between a pose and textual prompts. The ability to map a pose to text allows a machine to understand the meaning. Text can be stored in a small amount of RAM and allows to utilize other AI algorithm on the system. A longer robot sequence is generated by providing a longer text stream. Robot control means usually to specify in english sentences what the robot is doing next. These text commands are converted into poses and then into animation.

The single important technique is the ability to use natural language as an abstraction mechanism. Every possible pose is described with words like "leg, knee, hand, left, right, up, down" and so on. These English words are reducing the complexity which is the major problem in Artificial Intelligence. Instead of planning low level trajectories or trying to search in the state space of 3d poses, the elaborated alternative is to focus on textual description and program a text to image convert on the second layer.

Technically, a pose estimation software including a text to animation system can be realized with outdated hardware from the early 1980s. Even an 8bit homecomputer like the C64 is more than capable in doing so. The challenge is not the programming task but to recognize that pose estimation leads to robotics.

December 05, 2025

Artificial Intelligence as language games

In the past there were multiple definitions available what Intelligence is about. From a philosophical standpoint its often defined as problem solving skills, and from a computer science perspective, AI is mostly introduced as expert system or neural network. Even if these definitions are correct they do not explain how to implement AI on a computer.

A more convenient definition is, that Artificial Intelligence is about language games. Language games are not about technology itself, for example a certain computer hardware or a certain programming language, but a language game defines a problem. After solving the problem, the computer is intelligent.

An example language game is the question answering game. There is a database with entries and the computer has to find for a request the most similar existing entry in the database. The user might ask "What is the capital of France?" and the computer is searching through the database for an answer.

Another language game root in robotics is the instruction following task. here the user formulates a longer command sequence like "move to room B and pick up the green box" and the robot has to execute it.

What makes language games a promising candidate for an AI definition is, that are independent from a certain computer paradigm, and they are more scientific than a purely philosophical definition of intelligence. A language games can be played with a computer and its possible to win or loose such a game. for example, in the example with the q6a retrieval, the algorithm can return the wrong answer, or in the instruction following game, the robot might move to the wrong room.

December 04, 2025

Hohe Einstiegshürden für lokale LLMs

Obwohl es im Internet viele Tutorials gibt wie man auf der Workstation ein Large language model betreiben kann ist objektiv gesehen das Unterfangen zum Scheitern verurteilt. Eine aktuelle PC Workstation für 1000 EUR ist um den Faktor 500 zu klein und zu wenig leistungsfähig um einen halbwegs aktuellen Chatbot zu betreiben. Und es geht hier lediglich um Textchatbots nicht um um generative Bilderzeugung oder generative Audiogenerierung.

Zu den Details. Die Basis für jeden chatbot der mittels neuronaler Netze betrieb wird ist ein Word embeddings model. Es gibt dazu mehrere Opensource Projekte wie Fasttext oder gensim die mit vortrainierten Word embeddings ausgestattet sind. Allerdings ist die Datei die man sich aus dem Internet herunterladen muss stolze 5 GB groß. Und diese Datei ist als minimal Word embedding zu verstehen. Wenn man die Datei im RAM entpackt steigt der Speicherbedarf auf 16 GB An. Und damit hat man nur das word embedding also eine Zuordnung von Worten aus dem Lexikon zu semantischen Kategorien in Matrizenschreibweise. Will man dieses word embedding model für ein Question answering problem anwenden oder damit lokale Textdateien indizieren erhöht sich der Speicherbedarf weiter.

Eine halbwegs solide Hardware um lokale Large language modelle zu betreiben startet bei Anschaffungskosten von 500k EUR. Darin enthalten ist RAM in Höhe von 16 Terabyte. Diese Hardware ist keine Workstation mehr sondern wäre ein Superminicomputer, der unerschwinglich ist für Privatpersonen und allenfalls von Universitäten oder Firmen betrieben werden kann. Mit so einem System ist es in der Tat möglich, einen chatbot aufzusetzen bestehend aus word embeddings, der fasttext library plus einiger Volltextdatenbanken. Auch für künftige Projekte wären die veranschlagten 16 TB RAM ausreichend, das heißt man könnte Experimente machen in Richtung maschinelle Übersetzung oder im automatischen Programmieren.

Wie ein kleiner Blick auf die Kosten zeigt, sind lokale LLM Systeme außerhalb der Möglichkeiten von privatanwendern. Diesen verbleibt nur auf Cloud Anbieter zu setzen, wo also die Hardware im Internet betrieben wird und der Nutzer lediglich Zugriff erhält auf den chatbot. Entweder über den webbrowser oder eine API schnittstelle.

Selbstverstänglich kann man kritisch fragen, ob man nicht auch mit weniger Aufwand ein lokales LLM betreiben kann. Das man also word embeddings nutzt die kompakter sind und vielleicht nur 10 MB benötigen. Leider lautet die 'Antwort darauf nein, sowas ist technisch nicht möglich. Die ersten Large language modelle wie GPT-2 wurden ab den Jahr 2022 entwickelt. Will man ohne Word embeddings und ohne sehr große Datensätze ein Projekt durchführen müsste man Technologie verwenden vor diesem Stichtag. Es gab auch vor 2022 bereits Software zur natural language processing und chatbots. Zu nennen wäre das AIML dateiformat worin man wissensbasen für chatbots speichert. Diese Systeme sind sehr genügsam was die Hardware betrifft und laufen auf normalen Desktop PCs. Leider besteht der nachteil dass AIML chatbots und äöltere dokument retrieval systeme eine sehr geringe leistung aufweisen. Ein AIML Chatbot ist eine Art von Spieleprogram womit man einen simulierten Dialog führen kann, aber was keinen echten Nutzen hat. Deshalb haben sich diese älteren Chatbots auch nie durchsetzen können. Es gibt keine Nachfrage nach solchen Systemen. Etwas ähnliches gilt für das sehr alte Eliza system, was technisch ein chatbot ist, aber für den Anwender keinen nutzen besitzt. Es ist durchaus interessant mit Eliza einen Dialog zu führen, aber nachdem man das 10 minuten gemacht hat, erkennt man die Limitierungen des Konzepts.

Moderne Large Language modelle die ab 2022 entstanden können als weiterentwicklung früherer Chatbots verstanden werden. Ihre Leistung ist höher aber gleichzeitig sind auch die Hardware anforderungen höher.

December 03, 2025

Der Marketshare von Linux

Eine Fehlannahme der Open Source community lautet dass es möglich sei den gegenwärtigen niedrigen Marktanteil von Linux auf dem Desktop zu erhöhen. Von aktuell 4% auf 10%, dann auf 20% usw bis Linux WIndows verdrängt hat. Diese Annahme wird seit den 1990er propagiert, nur hat sie sich seit über 20 Jahren als nicht realisierbar herausgestellt. Die ERklärung hat weniger etwas mit der Software Linux an sich zu tun sondern mit den Erwartungen der Nutzer.

Der Hauptgrund warum besonders Softwareentwickler Linux als mächtige Alternative zu Windows auf dem Desktop schätzen hat etwas mit den Stärken von Linux zu tun: es ist Open Source, es enthält eine Kommmandozeile, es ist konfigurierbar, es enthält vorinstallierte Compiler für C++ und Python, es enthält vorinstallierte SQL Datenbanken und ganz wichtig es verfügt mit kvm über eine Virtualisierungsumgebung um weitere Linux und Windows instanzen zu starten.

Leider spielen diese Punkte für den Mainstream anwender keine Rolle. Was für den ENdkonsumenten wichtig ist geht eher in Richtung: Auswahl aus sehr vielen Spielen, vorinstalliert auf einem PC, technischer Support durch Hersteller und Auswahl aus 200k vorhandenen Programmeren. Keine dieser Erwartungen kann Linux erfüllen. in all diesen Punkten ist Linux sehr schlecht und es gibt keine Aussicht auf Besserung.

Daraus folgt, dass der künftige Marktanteil von Linux auf dem Desktop bei dem heutigen niedrigen Wert von 4% verharren wird, sowohl in 2030, in 2040 und die jahre danach wird Linux ein Nischenbetriebssystem bleiben. Entwickelt für professionelle Informatiker die Datenbanken, Programmiersprachen und virtuelle Computer benötigen und für die Windows zuviele Einschränkungen mitbringt.

The myth of autonomous robotics

In the past of computer science philosophy until around the year 2010 a certain paradigm was widespread available about the inner working of a robot. The idea was derived from science fiction novels written by Isaac Asimov and were based on the idea of an independent robot who is not in control of a human operator but takes its own decisions. In most or even all science fiction stores about humanoid robots, the robots have their own brain which allows them to take decision, analyze a situation and take actions. These fictional robots have much in common with animals in nature who are also independent beeings with their own will.

Engineers in the past were trying to realize this idea in technology, namely in hardware and software. The goal was to program a closed system which takes decisions by its own. The concrete realization can be seen in early self driving cars and early maze robots who are working in the autonmous mode.

Despite the large amount of effort to realize these robots, the concept of autonomous robotics has failed. The typical autonomous car programmed before the year 2010 was powered by millions lines of code but wasn't able to solve simple navigation tasks. The bottleneck is not located in a certain software architecture but it has to do with the idea of autonomy. This idea prevents the development of advanced artificial intelligence which is not working independent from a human operator but assumes teleoperation and especially text based teleoperation.

Solving the so called "instruction followin" task in robotics is much easier than implementing autonomouos robots. instruction following means basically, that gets instruction from a human. For example, the robot is grasping the ball because the human operator is pressing the button for "grasp the ball".

Such a remote controlled robot can't be called intelligent anymore, but its a tool similar to a crane which also operates by levers pressed by a human. The goal of building autonomous robots makes only sense for science fiction novels but its a bad advice for implementing robots in the reality. real robotis is based on teleoperation and voice commands.

The beginning of modern teleoperated robotics can be traced back to a single talk, held by Edwin Olson in 2010.[1] He explained to the perplexed audience that his robots doesn't working with software nor algorithms, but they are teleoperated with a joystick. Olsen claims, that such a control paradigm is harder to realize than classical algorithm based robot control.

To understand why the audience during this 2010 talk was upset, we have to listen was Olsen said exactly. In the introduction he made a joke about former attempts in realizing robotics, especially the idea of writing large amount of software for implementing algorithms. These large scale software based robots were seen as the here to stay paradigm for most of computer scientists and it was blasphemy to question this paradigm in the year 2010. In simpler words Olsen said basically, that all the sophisticated motion planning algorithms developed in thousands lines of code with endless amount of man hours are useless, and his robots are controlled by a joystick which is more efficient. Some people in the audience assumed, that Edwin Olsen is not a computer scientist but a comedian and perhaps they are right.

Edwin Olsen didn't mention in his talk natural language as source for robot control, but he is focussing only on joystick control. His talk is focusson the difference of autonomous robots vs teleoperated robots.

[1] Winning the MAGIC 2010 Autonomous Robotics Competition https://www.youtube.com/watch?v=OuOQ--CyBwc

December 02, 2025

Disappointed with Linux Mint

The promise of the Linux Mint system is, that its a beginner friendly lightweight system in tradition of Ubuntu. None of these claims was fulfilled, there are many reasons to dismiss the ISO file and prefer the old and widely used Debian system instead.

First problem is, that a normal gnome on wayland system with Linux will occupy around 5 GB of RAM after major applications are started. This makes the system a poor choice for older hardware. To be fair, the Debian system needs the same amount of RAM and even more but at least Mint has to no advantage. It seems, that a modern operating system needs always too much of RAM.

Second problem is the quality of the documentation. The main Linux online forum can't be displayed on a text browser like lynx which is a bad confusing for a Linux operating system which is based on the idea of openess, and command line preferences.

Third problem with Linux mint is, that the built in GUI interface cinnamon looks a bit outdated. IT has much in common with the XFCE system. Of course its possible to switch the desktop in favor to KDE or gnome, but this will Linux mint transform into another distribution like Debian. Another reason to reject the Cinnamon desktop is, that the Linux philosopöhy works different from Windows. In Microsoft Windows there is indeed only one desktop available for everyone, but in Linux there are around 8 different desktops out there and the user has to decide for one of them. Linux mint hasn't solve this issue but has added another desltop to the existing ecosystem.

In general, Linux mint is not a bad Linux distribution. It can be installed easily and has preinstalled Libreoffice suite plus a webbrowser so its a good starting point for former Windows users. The real problem is, that Linux itself was not designed for a mainstream audience. Linux is a programmer friendly open source operating system with a maximum market share of 5%. Its not possible to motivate more users to install the system on their computer and Linux mint won't increase the market share in the future.

Unfurtunately Linux Mint has a middle positition which doesn't fulfill the needs of the user. Its to complicated for normal users because it is delivered with powerful software like a GCC c Compiler and a command line which makes it a good choice for a programmer's workstation. on the other hand, it doesn't fit to expert users because Linux mint has to much GUI oriented addtional software and a poor documantation which doesn't fit to the need of expert users. Because Linux mint is based on the Debian and Ubuntu packages it won't replace these established projects and is some sort of add one which is not required by most people.