November 30, 2025

Word2vec and the invention of large language models

 Deep learning with neural networks was available since the 2000s and chatbots like Eliza were much older and available since the late 1960s. What was missing until the 2020s was a modern large language model and there is a reason available why such AI related technology was invented very late in the history of computing. Because of the absence of word embedding algorithms.

Simple word embeddings algorithm like bag of words were introduced in parallel to document clustering in search engines. The idea was to convert a text document into a numerical dataset. It took until the year 2013 since the advent of more advanced word embeddings algorithms like word2vec. Word2vec is an impöroved version of bag of words which is working with higher semantic understanding and was designed for neural network learning.

A word embedding algorithm itself is not a chatbot, its only the prestep to convert a text dataset into a numerical dataset. But without such an algorithm, modern deep learning systems can't be applied to chatbot design. So we can say, that word2vec and more recent algorithm are the missing part before it was possible to realize advanced large language models.

The main task of a word embedding algorithm is to convert a natural language processing task like document indexing or dataset parsing into a maschine learning task. Maschine learning is usually realized with neural networks which are trained with machine learning algorithms. It should be mentioned, that neural networks can only be feed with numerical data e.g. a value from +0.0 to +1.9. But neural networks can't be feed with text information like "The quick brown fox jumps over the lazy dog". The reason is, that artificial neural networks have its root in mathematics and statistics which is by definition the science of number crunching.

Such kind of detail information is important because neural networks in the original version can't be applied to language problems, but only to statistical problems. That is the reason why neural networks were not useful for many decades apart from niche application in maschine learning. Before the advent of NLP, neural networks were mostly be used to analyze time series with numerical information, e.g. for weather prediction and for trajectory smoothing. These are classical numerical problems within mathematics.

No comments:

Post a Comment