November 30, 2025

Word2vec and the invention of large language models

 Deep learning with neural networks was available since the 2000s and chatbots like Eliza were much older and available since the late 1960s. What was missing until the 2020s was a modern large language model and there is a reason available why such AI related technology was invented very late in the history of computing. Because of the absence of word embedding algorithms.

Simple word embeddings algorithm like bag of words were introduced in parallel to document clustering in search engines. The idea was to convert a text document into a numerical dataset. It took until the year 2013 since the advent of more advanced word embeddings algorithms like word2vec. Word2vec is an impöroved version of bag of words which is working with higher semantic understanding and was designed for neural network learning.

A word embedding algorithm itself is not a chatbot, its only the prestep to convert a text dataset into a numerical dataset. But without such an algorithm, modern deep learning systems can't be applied to chatbot design. So we can say, that word2vec and more recent algorithm are the missing part before it was possible to realize advanced large language models.

The main task of a word embedding algorithm is to convert a natural language processing task like document indexing or dataset parsing into a maschine learning task. Maschine learning is usually realized with neural networks which are trained with machine learning algorithms. It should be mentioned, that neural networks can only be feed with numerical data e.g. a value from +0.0 to +1.9. But neural networks can't be feed with text information like "The quick brown fox jumps over the lazy dog". The reason is, that artificial neural networks have its root in mathematics and statistics which is by definition the science of number crunching.

Such kind of detail information is important because neural networks in the original version can't be applied to language problems, but only to statistical problems. That is the reason why neural networks were not useful for many decades apart from niche application in maschine learning. Before the advent of NLP, neural networks were mostly be used to analyze time series with numerical information, e.g. for weather prediction and for trajectory smoothing. These are classical numerical problems within mathematics.

Word embeddings as the bottleneck in large language models

 Before a computer can process written information the dataset corpus needs to be transformed into a numerical representation. Otherwise the neural network can't be trained on the data. The problem is, that nearly all input datasets are formulated in English. There are a list of question answer pairs stored in a .csv file. A typical entry might be:
"What is the north?", "Its a direction similar to east and west"
"What is red?", "Its a color similar to blue or green".

These pairs are highly sense making for humans but a computer won't understands the words. Every existing neural network architecture requires numerical data in a floating point range. Unfortunately, the example dataset has no floating point numbers but only words.

Even if the problem is obvious it was discovered very late in computer science. First attempt for document retrieval doesn't require a word embedding model. Because classical text retrieval was realized with full text search engines. The algorithm compares the input sentence with a database and returns the correct document. Only if the text retrieval should be realized with a neural network, there is a need to convert the documents into a vector space which is the task of a word embedding model like word2vec or fasttext.

Modern large language models are built with word embeddings models in the background. These embedding make sure, that the neural network understands the sentences in the corpus. The word embedding model influences how fast and accurate the resulting chatbot is. For example, a minimalist bag of word model with a vocabulary of 100 words won't understand or generate an academic paper because the translation from a full text document into a vector space doesn't work well enough.

A domain specific bag of words model is perhaps the most minimal example for word embedding and should be explained for a point & click adventure game. There are only 4 verbs (walkto, pick, drop, use) and 4 nouns (key, stairs, door, ball). Each word is assigned to a number and possible sentences looks like:

"walkto door" = (0,2)
"pick key" = (1,0)
"use ball" = (3,3)

The first number in the vector is submitted to neuron #1 while the second number is submitted to neuron #2. Most existing point&click adventures doesn't implement a dedicated word embedding model to store the internal communication vocabulary, but for exploring new tools and NLP techniques it makes sense to introduce word embeddings into video games. 

November 23, 2025

language games in recent robotics

 Every machine contains of an internal mechanism which can be explained from a scientific perspective. A steam engine is driven by combustion, a computer works with electricity and a robot also have an internal driving force. In the science fiction world, the core mechanism of a robot is sometimes an AI chip which enables the robot to think. A rough estimation is, that robots in the reality also have a chip or a graphics processing unit which is very expensive and makes the robot move and think. Unfurtunately, this hardware oriented explanation is wrong. Modern robotics has no AI chip.

The next possible explanation is, that a robot is driven by a software architecture, for example an algorithm or a robot control software. Such a computer program would be the core element and enables the robot to take decision. Unfurtunately, the hint with a software architecture is also wrong. Modern robotics doesn't require a dedicated firmware nor an operating system.

If hardware and software both is not the explanation for the artificial intelligence inside a robot there are not much alternative explanations available. At the same time, recent robotics has demonstrated remarkable skills like biped walking anb dexterous object grasping so there must be a mechanism available which explains the internal working. The mechanism is a bit hidden. Its not located inside the robot torso but its outside of the robot. Or to be more specific, the driving force behind modern robotics are language games.

A language game is an activity played with a set of rules. Typical games might be chess or 4 in a row. A language game is a certain category of a game which is working interactively and by using words. Language games are the driving force behind recent robotics. A certain robot for example a biped robot, implements a language game. The language game defines also the limitation of a robot. For example if the game is about navigating in a warehouse scenario, the robot can do only this single task.

The problem with games and especially with language games is, that they can't located in the reality very well. A game doesn't need a cpu and no certain software program, but a game is an abstract idea described in a document. For example the game of chess can be implemented in different physical chess boards which might have 10cm width, 14 cm width or it can be implemented in a video game. The same situation is available for language games. A certain speaker to hearer dialogue game can be implemented on different computer hardware with different algorithms. THe only fixed element is the game itself.

Even if games can't be located physically inside a robot, they are part of the reality. Abstract ideas are usually described in books, and books are located in a library. In a library there are many books available about board games, card games, word puzzles and language games for robots. These books are the single explanation why modern robotics is working.

November 21, 2025

Instruction following with reward function for box pushing


 The human operator selects an instruction from the menu and the game engine determines the numerical reward for this action. This allows to execute sequences of actions in a physics engine. Its a "text to reward" system for grounded language. In the prototype, the reward function was hard coded in the source code, so the system will understand only the 5 predefined instructions.

November 06, 2025

Einführung in das Symbol grounding problem

Da die vorhandene deutsche Literatur zur Thematik sehr klein ist, folgt hier ein kurzer Blogpost auf Deutsch der die THeorie von Stevan Harnad darstellt. Bei Symbol Grounding geht es um eine besondere Form eines Computerspiels bei dem natürliche Sprache verwendet wird. Z.B. sind auf dem Bildschirm verschieden farbige Objekte zu sehen und die Anweisung lautet: klicke mit der Maus auf den grünen Kreis. Wenn der Benutzer das richtige Objekt auswählt erhält er einen Punkt und die nächste Aufgabe wird formuliert. Ein weiteres Beispiel ist ein NPC Quest in einem RPG videogame bei dem die Aufgabe lautet: "bringe mir die Kiste aus dem Wald" und im Gegenzug erhält der Spieler dafür +10 Punkte gutgeschrieben. Auch hier wieder wird die Anweisung als Text formuliert und muss vom Spieler in Handlungen überführt werden.

Das SHRDLU Projekt aus dem Jahr 1972 kann ebenfalls als Beispiel für symbol grounding genannt werden. Hier gibt der menschlicher Bediener einen Befehl ein, der von der Software in Handlungen übeführt wird.

Das Symbol grounding Problem ist nicht an eine bestimmte Programmiersprache oder Software gebunden sondern es ist ein abstraktes Spiel ähnlich wie TicTacToe oder Scrabble. Es kann auf unterschiedlichen Computersystemen und mit unterschiedlicher Konfiguration implementiert werden. Wichtig ist nur, dass natürliche Sprache genutzt wird um Anweisungen zu formulieren. Entweder von einem NPC CHarakter an den Spieler oder umgekehrt der menschliche Spieler formuliert die Anweisung für den Computer, auch als "instruction following" bekannt.

Auch ohne das Symbol grounding Problem gab es Versuche innerhalb der Informatik natürliche Sprache zu verarbeiten. Natural language processing (NLP) ist allerdings auf Sprache an sich fokussiert, versucht also Texte von Deutsch nach Englisch zu übersetzen oder in einem Satz Wortarten zu identifizieren. Beim Symbol grounding problem geht es nicht um das Verstehen von menschlicher Sprache inkl. der komplizierten Grammatik sondern es wird eine vereinfachte Sprache aus 100 Worten und weniger vernwendet um darauf auf die Umgebung eines Roboter zu referenzieren. Also Sprache und Sensor-Wahrnehmung zu verbinden. Sprache beim Symbol grounding problem wird instrumentell als Abstraktionsmechanismus verwendet, also als ein Kompressionsverfahren um den Zustandsraum zu verkleinern. Damit kann, so die Befürworter des Ansatzes, das P=NP Problem gelöst werden und vormals sehr schwierige Robotik-Probleme elegant gelöst werden.

November 04, 2025

Modern robotics works with externalize intelligence

Until the year 2020 it was hard or even impossible to build and program robots. There were endless amount of challenges available but no algorithm or theory was found. Since around 2020 most of the problems are solved and the underlying technology should be explained briefly.

In contrast to the assumption of the past, the AI technology isn't based on software libraries, algorithm or neural networks. Its has nothing to do with expert systems, reinforcement learning, deep learning or the Tensorflow library. What is used to control robotics instead is a less popular approach called externalize intelligence.

The concept has much in common with a teleoperated robot but its working with natural language. The idea is to externalize the complex AI knowledgebase from the robot to a human operator. If the important routines doesn't belong anymore to the robot itself, there is no need to program them in software. For doing so a communication interface is needed between the robot and the human operator.

Let me give an example. Suppose the idea is to program a warehouse robot. There is a human operator how gives the voice commands which are high level goals like "move to room A", "pick up the green box". These commands are submitted to the robot. That means, the robot itself doesn't know why it should move to room A but the robot executes the command from the human. This allows to reduce the complexity of the robot to minimum. The robot isn't able to navigate in the warehouse by itself, but the robot gets a stream of voice commands from the human and executes them.

Of course, such a voice based teleoperation needs to be programmed in a software with a programming language like C/C++. The details fo the implementation are minor problems which can be solved by programmers. Its a normal software engineering project, like programming a spreadsheet application. The programmer gets the specification for the natural language interface and writes the code.

The proposed principle around externalize intelligence was known in computer science before the year 2020. Projects like SHDRLU, teleoperated robots and NPC quests in RPG games were available and they have been documented. What was missing was the understanding, that these techniques are very powerful and can be utilized to program robots.

November 01, 2025

Improved command based reward function

In addition to the previous demonstration with language guided robotics, here is another prototype for a maze robot. The NPC in the game generates a random command like “move to topright” and the human player has to fulfill the task. This time a numerical reward is given in the form of the Manhattan distance and a visible marker for the target zone is available.
The core element of the game is the datastructure with possible instructions which maps language to action:
self.instructions={ # id, name, targetpos, link
    0: ["walkto topright",(9,0),4],
    1: ["walkto leftcoin",(4,5),2],
    2: ["walkto rightcoin",(9,5),1],
    3: ["walkto enemy",(3,1),None],
    4: ["walkto topleft",(3,1),0],
    5: ["walkto bottomleft",(0,6),4],
}
 
In contrast to classical Reinforcement Learning, the robot in the game isn’t controlled by a computer program but the human operator controls the robot. On the other hand, the Non player character (NPC) which is the referee is automated in software. It decides by its own what goal is next and what the reward is.
 

Programming effort for Pong videogame for different decades

 

The task is to write a pong clone including two paddles, a ball and a visible score. The constraint is that only tools are allowed which were available in a certain decade of computer history.
year
description
Programming effort
1985
BASIC on 8bit homecomputer without permanent access to the device
40 hours
1985
BASIC on 8bit homecomputer with permanent access to the device
20 hours
1995
C in MS-DOS
15 hours
2010
Python with pygame
5 hours
2025
Javascript generated by Large Language model
1 hours