July 29, 2025

Estimate the hardware requirement for large language models

Since the advent of chatgpt in 2023, most people are familiar how to use these AI systems for executing prompts. Even non programmers are able to generate stories and create summaries of existing content in the internet. Endless amount of tutorials are available who explaining what an LLM is and how to use it to answer questions.

A seldom explored but also interesting subject is how to run a large language model on the own computer. The first misconception is, that beginners think that a large language model can be installed similar to a new Linux distribution. All what is needed is an older PC and a fast internet connection. Unfortunately, this is an underestimation about the complexity of the situation.
A more realistic assumption is, that a dedicated supercomputer is needed which costs around 1 million US$ to run a large language model. To verify this claim let us go a step backward and describe a minimalist version of a real llm.
So called, vector databases, are advanced full text databases for semantic search. They are less advanced than large language models, but more powerful than simple SQL databases. A typical example is to convert the content of wikipedia into a vector database and use the information to answer simple Question&answer problems. For example the user might ask “What is Paris?”, or “Tell me about machine learning” and the computer program has to retrieve the information from the vector database and gives a short and precise answer.
To realize such a vector database on a computer, around 12 CPU cores and 100 GB of RAM are needed. So we can say, that a vector database which is hosting a simple wikipedia dataset requires a high end root server which costs around 10000 US$.
In contrast, a dedicated large language model is more advanced than a simple semantic search on wikipedia articles. The underlying database is larger and the pipeline until an answer can be generated is more complex. Its for sure, that a large language model requires more but not less powerful hardware. Very small large language model which are working very slow can be executed on hardware which costs around 100k US$. Such kind of hardware goes beyond simple consumer hardware and consists of multiple CPU, larger amount of RAM and very important dedicated GPUs. If the attempt is to tun a state of the art llm in average performance, the initial mentioned supercomputer for 1 million US$ is required. The situation can be compared with the advent of Unix in the mid 1980s. Mainframe computer during the 1980s for running Unix were more expensive than simple 8bit homecomputers.
task
price US$
desktop PC
1000
vector database with wikipedia
10000
vector database for multiple documents
50k
minimalist large language model
100k
Large language model
1000k

To explore the capabilites of a supercomputer for 1 million US$ in detail we have to go back to the mid 1980s. During that period a DEC VAX 8600 was equipped with a 32bit CPU running at 12 Mhz, 16 MB RAM, multiple harddrives with 2 Gigabyte in total and a DECnet network. A typical usecase of a such a 1 million US$ would be database processing or a Telnet server.
For today's perspective, the goal of running a database server with only 16 MB of RAM and a 12 Mhz CPU sounds a bit optimistic, because such a configuration allows only to create smaller databases with low workload. But, the described configuration was state of the art in the mid 1980s. There was no computer available which was much faster.

The assumption is that the same dilemma is available today in the year 2025. If the goal is to run a state of the art large language model, there is a need to use a supercomputer grade hardware for around 1 million US$.

No comments:

Post a Comment