Robotics and Artificial Intelligence

December 05, 2025

Artificial Intelligence as language games

In the past there were multiple definitions available what Intelligence is about. From a philosophical standpoint its often defined as problem solving skills, and from a computer science perspective, AI is mostly introduced as expert system or neural network. Even if these definitions are correct they do not explain how to implement AI on a computer.

A more convenient definition is, that Artificial Intelligence is about language games. Language games are not about technology itself, for example a certain computer hardware or a certain programming language, but a language game defines a problem. After solving the problem, the computer is intelligent.

An example language game is the question answering game. There is a database with entries and the computer has to find for a request the most similar existing entry in the database. The user might ask "What is the capital of France?" and the computer is searching through the database for an answer.

Another language game root in robotics is the instruction following task. here the user formulates a longer command sequence like "move to room B and pick up the green box" and the robot has to execute it.

What makes language games a promising candidate for an AI definition is, that are independent from a certain computer paradigm, and they are more scientific than a purely philosophical definition of intelligence. A language games can be played with a computer and its possible to win or loose such a game. for example, in the example with the q6a retrieval, the algorithm can return the wrong answer, or in the instruction following game, the robot might move to the wrong room.

December 04, 2025

Hohe Einstiegshürden für lokale LLMs

Obwohl es im Internet viele Tutorials gibt wie man auf der Workstation ein Large language model betreiben kann ist objektiv gesehen das Unterfangen zum Scheitern verurteilt. Eine aktuelle PC Workstation für 1000 EUR ist um den Faktor 500 zu klein und zu wenig leistungsfähig um einen halbwegs aktuellen Chatbot zu betreiben. Und es geht hier lediglich um Textchatbots nicht um um generative Bilderzeugung oder generative Audiogenerierung.

Zu den Details. Die Basis für jeden chatbot der mittels neuronaler Netze betrieb wird ist ein Word embeddings model. Es gibt dazu mehrere Opensource Projekte wie Fasttext oder gensim die mit vortrainierten Word embeddings ausgestattet sind. Allerdings ist die Datei die man sich aus dem Internet herunterladen muss stolze 5 GB groß. Und diese Datei ist als minimal Word embedding zu verstehen. Wenn man die Datei im RAM entpackt steigt der Speicherbedarf auf 16 GB An. Und damit hat man nur das word embedding also eine Zuordnung von Worten aus dem Lexikon zu semantischen Kategorien in Matrizenschreibweise. Will man dieses word embedding model für ein Question answering problem anwenden oder damit lokale Textdateien indizieren erhöht sich der Speicherbedarf weiter.

Eine halbwegs solide Hardware um lokale Large language modelle zu betreiben startet bei Anschaffungskosten von 500k EUR. Darin enthalten ist RAM in Höhe von 16 Terabyte. Diese Hardware ist keine Workstation mehr sondern wäre ein Superminicomputer, der unerschwinglich ist für Privatpersonen und allenfalls von Universitäten oder Firmen betrieben werden kann. Mit so einem System ist es in der Tat möglich, einen chatbot aufzusetzen bestehend aus word embeddings, der fasttext library plus einiger Volltextdatenbanken. Auch für künftige Projekte wären die veranschlagten 16 TB RAM ausreichend, das heißt man könnte Experimente machen in Richtung maschinelle Übersetzung oder im automatischen Programmieren.

Wie ein kleiner Blick auf die Kosten zeigt, sind lokale LLM Systeme außerhalb der Möglichkeiten von privatanwendern. Diesen verbleibt nur auf Cloud Anbieter zu setzen, wo also die Hardware im Internet betrieben wird und der Nutzer lediglich Zugriff erhält auf den chatbot. Entweder über den webbrowser oder eine API schnittstelle.

Selbstverstänglich kann man kritisch fragen, ob man nicht auch mit weniger Aufwand ein lokales LLM betreiben kann. Das man also word embeddings nutzt die kompakter sind und vielleicht nur 10 MB benötigen. Leider lautet die 'Antwort darauf nein, sowas ist technisch nicht möglich. Die ersten Large language modelle wie GPT-2 wurden ab den Jahr 2022 entwickelt. Will man ohne Word embeddings und ohne sehr große Datensätze ein Projekt durchführen müsste man Technologie verwenden vor diesem Stichtag. Es gab auch vor 2022 bereits Software zur natural language processing und chatbots. Zu nennen wäre das AIML dateiformat worin man wissensbasen für chatbots speichert. Diese Systeme sind sehr genügsam was die Hardware betrifft und laufen auf normalen Desktop PCs. Leider besteht der nachteil dass AIML chatbots und äöltere dokument retrieval systeme eine sehr geringe leistung aufweisen. Ein AIML Chatbot ist eine Art von Spieleprogram womit man einen simulierten Dialog führen kann, aber was keinen echten Nutzen hat. Deshalb haben sich diese älteren Chatbots auch nie durchsetzen können. Es gibt keine Nachfrage nach solchen Systemen. Etwas ähnliches gilt für das sehr alte Eliza system, was technisch ein chatbot ist, aber für den Anwender keinen nutzen besitzt. Es ist durchaus interessant mit Eliza einen Dialog zu führen, aber nachdem man das 10 minuten gemacht hat, erkennt man die Limitierungen des Konzepts.

Moderne Large Language modelle die ab 2022 entstanden können als weiterentwicklung früherer Chatbots verstanden werden. Ihre Leistung ist höher aber gleichzeitig sind auch die Hardware anforderungen höher.

December 03, 2025

Der Marketshare von Linux

Eine Fehlannahme der Open Source community lautet dass es möglich sei den gegenwärtigen niedrigen Marktanteil von Linux auf dem Desktop zu erhöhen. Von aktuell 4% auf 10%, dann auf 20% usw bis Linux WIndows verdrängt hat. Diese Annahme wird seit den 1990er propagiert, nur hat sie sich seit über 20 Jahren als nicht realisierbar herausgestellt. Die ERklärung hat weniger etwas mit der Software Linux an sich zu tun sondern mit den Erwartungen der Nutzer.

Der Hauptgrund warum besonders Softwareentwickler Linux als mächtige Alternative zu Windows auf dem Desktop schätzen hat etwas mit den Stärken von Linux zu tun: es ist Open Source, es enthält eine Kommmandozeile, es ist konfigurierbar, es enthält vorinstallierte Compiler für C++ und Python, es enthält vorinstallierte SQL Datenbanken und ganz wichtig es verfügt mit kvm über eine Virtualisierungsumgebung um weitere Linux und Windows instanzen zu starten.

Leider spielen diese Punkte für den Mainstream anwender keine Rolle. Was für den ENdkonsumenten wichtig ist geht eher in Richtung: Auswahl aus sehr vielen Spielen, vorinstalliert auf einem PC, technischer Support durch Hersteller und Auswahl aus 200k vorhandenen Programmeren. Keine dieser Erwartungen kann Linux erfüllen. in all diesen Punkten ist Linux sehr schlecht und es gibt keine Aussicht auf Besserung.

Daraus folgt, dass der künftige Marktanteil von Linux auf dem Desktop bei dem heutigen niedrigen Wert von 4% verharren wird, sowohl in 2030, in 2040 und die jahre danach wird Linux ein Nischenbetriebssystem bleiben. Entwickelt für professionelle Informatiker die Datenbanken, Programmiersprachen und virtuelle Computer benötigen und für die Windows zuviele Einschränkungen mitbringt.

The myth of autonomous robotics

In the past of computer science philosophy until around the year 2010 a certain paradigm was widespread available about the inner working of a robot. The idea was derived from science fiction novels written by Isaac Asimov and were based on the idea of an independent robot who is not in control of a human operator but takes its own decisions. In most or even all science fiction stores about humanoid robots, the robots have their own brain which allows them to take decision, analyze a situation and take actions. These fictional robots have much in common with animals in nature who are also independent beeings with their own will.

Engineers in the past were trying to realize this idea in technology, namely in hardware and software. The goal was to program a closed system which takes decisions by its own. The concrete realization can be seen in early self driving cars and early maze robots who are working in the autonmous mode.

Despite the large amount of effort to realize these robots, the concept of autonomous robotics has failed. The typical autonomous car programmed before the year 2010 was powered by millions lines of code but wasn't able to solve simple navigation tasks. The bottleneck is not located in a certain software architecture but it has to do with the idea of autonomy. This idea prevents the development of advanced artificial intelligence which is not working independent from a human operator but assumes teleoperation and especially text based teleoperation.

Solving the so called "instruction followin" task in robotics is much easier than implementing autonomouos robots. instruction following means basically, that gets instruction from a human. For example, the robot is grasping the ball because the human operator is pressing the button for "grasp the ball".

Such a remote controlled robot can't be called intelligent anymore, but its a tool similar to a crane which also operates by levers pressed by a human. The goal of building autonomous robots makes only sense for science fiction novels but its a bad advice for implementing robots in the reality. real robotis is based on teleoperation and voice commands.

The beginning of modern teleoperated robotics can be traced back to a single talk, held by Edwin Olson in 2010.[1] He explained to the perplexed audience that his robots doesn't working with software nor algorithms, but they are teleoperated with a joystick. Olsen claims, that such a control paradigm is harder to realize than classical algorithm based robot control.

To understand why the audience during this 2010 talk was upset, we have to listen was Olsen said exactly. In the introduction he made a joke about former attempts in realizing robotics, especially the idea of writing large amount of software for implementing algorithms. These large scale software based robots were seen as the here to stay paradigm for most of computer scientists and it was blasphemy to question this paradigm in the year 2010. In simpler words Olsen said basically, that all the sophisticated motion planning algorithms developed in thousands lines of code with endless amount of man hours are useless, and his robots are controlled by a joystick which is more efficient. Some people in the audience assumed, that Edwin Olsen is not a computer scientist but a comedian and perhaps they are right.

Edwin Olsen didn't mention in his talk natural language as source for robot control, but he is focussing only on joystick control. His talk is focusson the difference of autonomous robots vs teleoperated robots.

[1] Winning the MAGIC 2010 Autonomous Robotics Competition https://www.youtube.com/watch?v=OuOQ--CyBwc

December 02, 2025

Disappointed with Linux Mint

The promise of the Linux Mint system is, that its a beginner friendly lightweight system in tradition of Ubuntu. None of these claims was fulfilled, there are many reasons to dismiss the ISO file and prefer the old and widely used Debian system instead.

First problem is, that a normal gnome on wayland system with Linux will occupy around 5 GB of RAM after major applications are started. This makes the system a poor choice for older hardware. To be fair, the Debian system needs the same amount of RAM and even more but at least Mint has to no advantage. It seems, that a modern operating system needs always too much of RAM.

Second problem is the quality of the documentation. The main Linux online forum can't be displayed on a text browser like lynx which is a bad confusing for a Linux operating system which is based on the idea of openess, and command line preferences.

Third problem with Linux mint is, that the built in GUI interface cinnamon looks a bit outdated. IT has much in common with the XFCE system. Of course its possible to switch the desktop in favor to KDE or gnome, but this will Linux mint transform into another distribution like Debian. Another reason to reject the Cinnamon desktop is, that the Linux philosopöhy works different from Windows. In Microsoft Windows there is indeed only one desktop available for everyone, but in Linux there are around 8 different desktops out there and the user has to decide for one of them. Linux mint hasn't solve this issue but has added another desltop to the existing ecosystem.

In general, Linux mint is not a bad Linux distribution. It can be installed easily and has preinstalled Libreoffice suite plus a webbrowser so its a good starting point for former Windows users. The real problem is, that Linux itself was not designed for a mainstream audience. Linux is a programmer friendly open source operating system with a maximum market share of 5%. Its not possible to motivate more users to install the system on their computer and Linux mint won't increase the market share in the future.

Unfurtunately Linux Mint has a middle positition which doesn't fulfill the needs of the user. Its to complicated for normal users because it is delivered with powerful software like a GCC c Compiler and a command line which makes it a good choice for a programmer's workstation. on the other hand, it doesn't fit to expert users because Linux mint has to much GUI oriented addtional software and a poor documantation which doesn't fit to the need of expert users. Because Linux mint is based on the Debian and Ubuntu packages it won't replace these established projects and is some sort of add one which is not required by most people.

November 30, 2025

Word2vec and the invention of large language models

Deep learning with neural networks was available since the 2000s and chatbots like Eliza were much older and available since the late 1960s. What was missing until the 2020s was a modern large language model and there is a reason available why such AI related technology was invented very late in the history of computing. Because of the absence of word embedding algorithms.

Simple word embeddings algorithm like bag of words were introduced in parallel to document clustering in search engines. The idea was to convert a text document into a numerical dataset. It took until the year 2013 since the advent of more advanced word embeddings algorithms like word2vec. Word2vec is an impöroved version of bag of words which is working with higher semantic understanding and was designed for neural network learning.

A word embedding algorithm itself is not a chatbot, its only the prestep to convert a text dataset into a numerical dataset. But without such an algorithm, modern deep learning systems can't be applied to chatbot design. So we can say, that word2vec and more recent algorithm are the missing part before it was possible to realize advanced large language models.

The main task of a word embedding algorithm is to convert a natural language processing task like document indexing or dataset parsing into a maschine learning task. Maschine learning is usually realized with neural networks which are trained with machine learning algorithms. It should be mentioned, that neural networks can only be feed with numerical data e.g. a value from +0.0 to +1.9. But neural networks can't be feed with text information like "The quick brown fox jumps over the lazy dog". The reason is, that artificial neural networks have its root in mathematics and statistics which is by definition the science of number crunching.

Such kind of detail information is important because neural networks in the original version can't be applied to language problems, but only to statistical problems. That is the reason why neural networks were not useful for many decades apart from niche application in maschine learning. Before the advent of NLP, neural networks were mostly be used to analyze time series with numerical information, e.g. for weather prediction and for trajectory smoothing. These are classical numerical problems within mathematics.

Word embeddings as the bottleneck in large language models

Before a computer can process written information the dataset corpus needs to be transformed into a numerical representation. Otherwise the neural network can't be trained on the data. The problem is, that nearly all input datasets are formulated in English. There are a list of question answer pairs stored in a .csv file. A typical entry might be:
"What is the north?", "Its a direction similar to east and west"
"What is red?", "Its a color similar to blue or green".

These pairs are highly sense making for humans but a computer won't understands the words. Every existing neural network architecture requires numerical data in a floating point range. Unfortunately, the example dataset has no floating point numbers but only words.

Even if the problem is obvious it was discovered very late in computer science. First attempt for document retrieval doesn't require a word embedding model. Because classical text retrieval was realized with full text search engines. The algorithm compares the input sentence with a database and returns the correct document. Only if the text retrieval should be realized with a neural network, there is a need to convert the documents into a vector space which is the task of a word embedding model like word2vec or fasttext.

Modern large language models are built with word embeddings models in the background. These embedding make sure, that the neural network understands the sentences in the corpus. The word embedding model influences how fast and accurate the resulting chatbot is. For example, a minimalist bag of word model with a vocabulary of 100 words won't understand or generate an academic paper because the translation from a full text document into a vector space doesn't work well enough.

A domain specific bag of words model is perhaps the most minimal example for word embedding and should be explained for a point & click adventure game. There are only 4 verbs (walkto, pick, drop, use) and 4 nouns (key, stairs, door, ball). Each word is assigned to a number and possible sentences looks like:

"walkto door" = (0,2)
"pick key" = (1,0)
"use ball" = (3,3)

The first number in the vector is submitted to neuron #1 while the second number is submitted to neuron #2. Most existing point&click adventures doesn't implement a dedicated word embedding model to store the internal communication vocabulary, but for exploring new tools and NLP techniques it makes sense to introduce word embeddings into video games.

November 23, 2025

language games in recent robotics

Every machine contains of an internal mechanism which can be explained from a scientific perspective. A steam engine is driven by combustion, a computer works with electricity and a robot also have an internal driving force. In the science fiction world, the core mechanism of a robot is sometimes an AI chip which enables the robot to think. A rough estimation is, that robots in the reality also have a chip or a graphics processing unit which is very expensive and makes the robot move and think. Unfurtunately, this hardware oriented explanation is wrong. Modern robotics has no AI chip.

The next possible explanation is, that a robot is driven by a software architecture, for example an algorithm or a robot control software. Such a computer program would be the core element and enables the robot to take decision. Unfurtunately, the hint with a software architecture is also wrong. Modern robotics doesn't require a dedicated firmware nor an operating system.

If hardware and software both is not the explanation for the artificial intelligence inside a robot there are not much alternative explanations available. At the same time, recent robotics has demonstrated remarkable skills like biped walking anb dexterous object grasping so there must be a mechanism available which explains the internal working. The mechanism is a bit hidden. Its not located inside the robot torso but its outside of the robot. Or to be more specific, the driving force behind modern robotics are language games.

A language game is an activity played with a set of rules. Typical games might be chess or 4 in a row. A language game is a certain category of a game which is working interactively and by using words. Language games are the driving force behind recent robotics. A certain robot for example a biped robot, implements a language game. The language game defines also the limitation of a robot. For example if the game is about navigating in a warehouse scenario, the robot can do only this single task.

The problem with games and especially with language games is, that they can't located in the reality very well. A game doesn't need a cpu and no certain software program, but a game is an abstract idea described in a document. For example the game of chess can be implemented in different physical chess boards which might have 10cm width, 14 cm width or it can be implemented in a video game. The same situation is available for language games. A certain speaker to hearer dialogue game can be implemented on different computer hardware with different algorithms. THe only fixed element is the game itself.

Even if games can't be located physically inside a robot, they are part of the reality. Abstract ideas are usually described in books, and books are located in a library. In a library there are many books available about board games, card games, word puzzles and language games for robots. These books are the single explanation why modern robotics is working.