Robotics and Artificial Intelligence: July 2025

July 30, 2025

Interview: Gescheiterte LLM-Forschung Ein Blick zurück ins Jahr 2024

Moderator: Herzlich willkommen zu unserer Sendung "Digitalblick". Heute sprechen wir mit Professor Dr. Simon Mertens vom Institut für Angewandte Informatik über ein Forschungsprojekt, das im letzten Jahr für einige Schlagzeilen sorgte – allerdings nicht ganz so, wie ursprünglich geplant. Herr Professor Mertens, schön, dass Sie da sind.

Prof. Mertens: Guten Abend, Herr Meier. Danke für die Einladung.

Moderator: Professor Mertens, Ihr Projekt, das Anfang 2024 gestartet ist, sollte sich ja mit der lokalen Implementierung und Optimierung von Large Language Models befassen. Können Sie uns kurz den ursprünglichen Ansatz erläutern?

Prof. Mertens: Gerne. Unsere Idee war es, die damals aufkommenden, immer leistungsfähigeren Large Language Modelle (LLMs) nicht nur in der Cloud zu nutzen, sondern sie lokal auf hochperformanten Workstations betreiben zu können. Wir wollten die Abhängigkeit von externen Servern reduzieren und neue Anwendungsfelder für den Edge-Bereich erschließen – also direkt dort, wo die Daten entstehen. Das war ein sehr ambitioniertes Ziel, aber wir waren optimistisch.

Moderator: Und wie liefen die ersten Schritte? Man hatte ja den Eindruck, dass Sie schnell auf Probleme gestoßen sind.

Prof. Mertens: Das ist leider korrekt. Unser erster großer Stolperstein war die Hardware. Wir hatten eine der schnellsten Workstations des Jahres 2024 im Einsatz, ausgestattet mit neuesten GPUs und reichlich RAM. Doch selbst diese Maschine war den Anforderungen der LLMs, die wir testen wollten, schlichtweg nicht gewachsen. Die Modelle, auch wenn es kleinere Varianten gab, benötigten exorbitante Mengen an VRAM und Rechenleistung. Wir sprechen hier von Modellen mit Dutzenden oder gar Hunderten von Milliarden Parametern. Selbst das Laden der Modelle führte oft zum Absturz oder zu extrem langen Wartezeiten, die jegliche Forschung unmöglich machten. Das war ein ernüchterndes Erwachen aus unserem Optimismus.

Moderator: Das klingt nach einer technischen Sackgasse. Aber es gab ja wohl auch persönliche Schicksalsschläge im Projektteam, wenn ich richtig informiert bin?

Prof. Mertens: Ja, leider. Und das war ein schwerer Schlag, der uns alle tief getroffen hat. Mitte des Jahres verstarb unser geschätzter Kollege und Leiter der Hardware-Optimierung, Dr. Anton Gruber, völlig unerwartet an Herzversagen. Er war über 70 und ein brillanter Kopf, dessen Erfahrung und ruhige Art für unser Team von unschätzbarem Wert waren. Sein Tod riss nicht nur eine fachliche Lücke, sondern traf uns auch menschlich sehr hart. Er war eine treibende Kraft und sein Verlust war extrem demotivierend.

Moderator: Mein aufrichtiges Beileid. Und zu allem Überfluss kam dann auch noch das Aus für die Finanzierung, richtig?

Prof. Mertens: So ist es. Ende 2024 wurden uns die Forschungsgelder gekürzt. Offiziell hieß es, aufgrund der "ausbleibenden greifbaren Erfolge" und der "fehlenden Demonstration der Machbarkeit" der lokalen LLM-Implementierung. Ich verstehe die Entscheidung aus einer rein wirtschaftlichen Perspektive – wir hatten keine funktionierenden Prototypen vorzuweisen. Aber es war frustrierend, weil wir wussten, dass wir an der Grenze des damals technisch Machbaren waren und eben auf die extremen Anforderungen gestoßen sind. Die Kombination aus technischen Hürden, dem Verlust von Dr. Gruber und der Geldknappheit hat das Projekt dann quasi zum Erliegen gebracht.

Moderator: Ein wahres Lehrstück über die Tücken der Forschung. Wenn Sie heute, im Jahr 2025, zurückblicken, was nehmen Sie aus diesem gescheiterten Projekt mit?

Prof. Mertens: Nun, man lernt aus Misserfolgen oft mehr als aus Erfolgen. Wir haben gelernt, dass die Skalierung von LLMs auf Consumer-Hardware noch eine größere Herausforderung ist, als wir dachten. Es braucht massive technologische Sprünge in der Hardware-Effizienz oder völlig neue Architekturen. Und es hat uns wieder gezeigt, wie wichtig der menschliche Faktor in der Forschung ist. Der Verlust eines Teammitglieds kann ein ganzes Projekt zum Scheitern bringen, unabhängig von der Technologie. Es war ein teures, aber lehrreiches Scheitern, das die Grenzen des damals Machbaren aufgezeigt hat.

Moderator: Professor Mertens, vielen Dank für diese offenen Einblicke in Ihr Forschungsprojekt.

Prof. Mertens: Gerne, Herr Meier.

July 29, 2025

Estimate the hardware requirement for large language models

Since the advent of chatgpt in 2023, most people are familiar how to use these AI systems for executing prompts. Even non programmers are able to generate stories and create summaries of existing content in the internet. Endless amount of tutorials are available who explaining what an LLM is and how to use it to answer questions.

A seldom explored but also interesting subject is how to run a large language model on the own computer. The first misconception is, that beginners think that a large language model can be installed similar to a new Linux distribution. All what is needed is an older PC and a fast internet connection. Unfortunately, this is an underestimation about the complexity of the situation.

A more realistic assumption is, that a dedicated supercomputer is needed which costs around 1 million US$ to run a large language model. To verify this claim let us go a step backward and describe a minimalist version of a real llm.

So called, vector databases, are advanced full text databases for semantic search. They are less advanced than large language models, but more powerful than simple SQL databases. A typical example is to convert the content of wikipedia into a vector database and use the information to answer simple Question&answer problems. For example the user might ask “What is Paris?”, or “Tell me about machine learning” and the computer program has to retrieve the information from the vector database and gives a short and precise answer.

To realize such a vector database on a computer, around 12 CPU cores and 100 GB of RAM are needed. So we can say, that a vector database which is hosting a simple wikipedia dataset requires a high end root server which costs around 10000 US$.

In contrast, a dedicated large language model is more advanced than a simple semantic search on wikipedia articles. The underlying database is larger and the pipeline until an answer can be generated is more complex. Its for sure, that a large language model requires more but not less powerful hardware. Very small large language model which are working very slow can be executed on hardware which costs around 100k US$. Such kind of hardware goes beyond simple consumer hardware and consists of multiple CPU, larger amount of RAM and very important dedicated GPUs. If the attempt is to tun a state of the art llm in average performance, the initial mentioned supercomputer for 1 million US$ is required. The situation can be compared with the advent of Unix in the mid 1980s. Mainframe computer during the 1980s for running Unix were more expensive than simple 8bit homecomputers.

task	price US$
desktop PC	1000
vector database with wikipedia	10000
vector database for multiple documents	50k
minimalist large language model	100k
Large language model	1000k

To explore the capabilites of a supercomputer for 1 million US$ in detail we have to go back to the mid 1980s. During that period a DEC VAX 8600 was equipped with a 32bit CPU running at 12 Mhz, 16 MB RAM, multiple harddrives with 2 Gigabyte in total and a DECnet network. A typical usecase of a such a 1 million US$ would be database processing or a Telnet server.

For today's perspective, the goal of running a database server with only 16 MB of RAM and a 12 Mhz CPU sounds a bit optimistic, because such a configuration allows only to create smaller databases with low workload. But, the described configuration was state of the art in the mid 1980s. There was no computer available which was much faster.

The assumption is that the same dilemma is available today in the year 2025. If the goal is to run a state of the art large language model, there is a need to use a supercomputer grade hardware for around 1 million US$.

July 20, 2025

Simple chatbot in python

The most basic implementation of a chatbot works with predefined questio answer pairs stored in a Python dictionary. The human user has to enter exactly the predefined question to get an answer so the chatbot is a database lookup tool. Even if the program is less advanced than current large language models and its less mature than the Eliza software, its a good starting point to become familiar with chatbot development from scratch. The sourcecode consists of less than 50 lines of code in Python including the dataset.

import re

def run_chatbot():
    knowledge_base = {
        "hello": "Hi there! How can I help you today?",
        "how are you": "I'm a computer program, so I don't have feelings, but thanks for asking!",
        "what is your name": "I am a simple chatbot.",
        "who created you": "I was created by a programmer.",
        "what can you do": "I can answer questions based on my internal knowledge base.",
        "tell me a joke": "Why don't scientists trust atoms? Because they make up everything!",
        "what is the capital of france": "The capital of France is Paris.",
        "what is the largest ocean": "The Pacific Ocean is the largest ocean.",
        "what is the highest mountain": "Mount Everest is the highest mountain in the world.",
        "what is the square root of 9": "The square root of 9 is 3.",
        "what is the weather like today": "I'm sorry, I cannot provide real-time weather information.",
        "how old are you": "I don't have an age in the human sense.",
        "what is python": "Python is a high-level, interpreted programming language.",
        "what is AI": "AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines.",
        "where are you from": "I exist in the digital realm!",
        "can you learn": "I don't learn in the same way humans do. My responses are pre-programmed.",
        "what is gravity": "Gravity is a fundamental force of nature that attracts any two objects with mass.",
        "what is photosynthesis": "Photosynthesis is the process used by plants, algae, and cyanobacteria to convert light energy into chemical energy.",
        "what is the speed of light": "The speed of light in a vacuum is approximately 299,792,458 meters per second.",
        "thank you": "You're welcome! Is there anything else I can assist you with?"
    }
    print("Welcome to the simple Q&A Chatbot!")
    print("Type 'quit' or 'exit' to end the conversation.")
    print("-" * 40)

    while True:
        user_input = input("You: ").strip().lower()

        if user_input in ["quit", "exit"]:
            print("Chatbot: Goodbye! Have a great day.")
            break

        found_answer = False
        for question, answer in knowledge_base.items():
            if question in user_input:
                print(f"Chatbot: {answer}")
                found_answer = True
                break
        if not found_answer:
            print("Chatbot: I'm sorry, I don't understand that question. Can you please rephrase it?")

if __name__ == "__main__":
    run_chatbot()

July 17, 2025

Can AI replace human programmers?

In the year 2025, there is no clear answer available to this question. Maybe its possible to replace human programmers with an AI, or maybe not. What we can say for sure is, that for simpler tasks AI is more powerful than a human.

These simpler tasks are the chess game, the Tetris videogame and also the ability to answer programming related questions. The first chess AI which was superior over a human grandmaster was Deep blue from 1997. Playing Tetris on a grandmaster level was also demonstrated. What is missing is the proof for more advanced tasks. Despite the existence of Large language models, most existing software was written by human programmers. There are some tools available like git and programmer friendly IDE which claim to improve the efficiency but coding remains a human task. What current AI systems are able to do is to solve minor tasks within a programming project, for example to program a hello world app in python or answer a detail programming question.

The task of creating an entire application which consists of thousands lines of code is a demanding problem. Some progress was made into this direction but the outcome remains unclear. What we can say for sure is, that in the future the importance of Large language models for programming tasks will growth.

A possible benchmark to judge about an AI is its ability to contribute to existing software projects. The AI needs to create a commit which gets accepted within a project as a sense making contribution. If an AI is able to do so mulitiple times for different projects this would be a proof, that the AI can replace human programmers.

From a technical perspective, a commit is changeset in an existing project. It can be a bugfix or an additional feature. At least for simpler projects like a prime number generator or a tgictactoe videogame, current LLMs are able to do so out of the box with current technology. The open question is, if they are able to do so for more advanced projects like larger video game or en entire operating system.

Real world software projects consists of 10000 and more lines of code. In addition there is longer documentation and discussion available in a forum which needs to understood before a commit can be created. Even for an AI from the year 2025, it would a complex task.

July 13, 2025

AI generated window desktop

A minimalist GUI prototype written in Python and pygame was generated with an AI. Its possible to click on the file bar but executing additional programs is not possible. The source code consists of 180 lines of code and was entirely created by a large language model:

import pygame
import sys

# --- Pygame Initialization ---
pygame.init()

# --- Screen Dimensions ---
SCREEN_WIDTH = 1000
SCREEN_HEIGHT = 700
screen = pygame.display.set_mode((SCREEN_WIDTH, SCREEN_HEIGHT))
pygame.display.set_caption("Pygame: Desktop Simulation")

# --- Colors ---
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
LIGHT_BLUE = (173, 216, 230)
LIGHT_GREEN = (144, 238, 144)
DARK_GRAY = (50, 50, 50)
TOOLBAR_GRAY = (70, 70, 70)
BUTTON_HOVER = (90, 90, 90)
BUTTON_ACTIVE = (120, 120, 120)

# --- Font for text ---
font_small = pygame.font.Font(None, 24) # For menu items, etc.
font_medium = pygame.font.Font(None, 30) # For window titles
font_large = pygame.font.Font(None, 36) # For main elements

# --- Desktop Background ---
desktop_bg_color = (60, 60, 100) # A dark blue/purple for a desktop feel

# --- Taskbar/Top Bar Properties ---
taskbar_height = 40
taskbar_rect = pygame.Rect(0, 0, SCREEN_WIDTH, taskbar_height)
start_button_rect = pygame.Rect(5, 5, 80, 30) # x, y, width, height
start_button_text = "Start"
start_menu_active = False
start_menu_rect = pygame.Rect(5, taskbar_height, 150, 150) # Example menu size
start_menu_items = ["Terminal", "Browser", "Editor", "Settings"]
start_menu_item_rects = [] # To store rects for click detection

# --- Window Properties (as classes for easier management) ---
class Window:
    def __init__(self, x, y, width, height, color, title, content_text=""):
        self.rect = pygame.Rect(x, y, width, height)
        self.title_bar_height = 25
        self.title_bar_rect = pygame.Rect(x, y, width, self.title_bar_height)
        self.content_rect = pygame.Rect(x, y + self.title_bar_height, width, height - self.title_bar_height)
        self.color = color
        self.title = title
        self.content_text = content_text
        self.active_menu_message = "" # To show what menu item was clicked

        # Menu button rects (File and Edit)
        self.file_menu_rect = pygame.Rect(self.title_bar_rect.x + 5, self.title_bar_rect.y + 2, 40, self.title_bar_height - 4)
        self.edit_menu_rect = pygame.Rect(self.title_bar_rect.x + 50, self.title_bar_rect.y + 2, 40, self.title_bar_height - 4)

    def draw(self, surface):
        # Draw window content area
        pygame.draw.rect(surface, self.color, self.content_rect)
        pygame.draw.rect(surface, BLACK, self.content_rect, 2) # Border

        # Draw title bar
        pygame.draw.rect(surface, TOOLBAR_GRAY, self.title_bar_rect)
        pygame.draw.rect(surface, BLACK, self.title_bar_rect, 2) # Border

        # Draw title text
        title_surface = font_medium.render(self.title, True, WHITE)
        title_rect = title_surface.get_rect(centerx=self.title_bar_rect.centerx, centery=self.title_bar_rect.centery)
        surface.blit(title_surface, title_rect)

        # Draw menu buttons (File, Edit)
        pygame.draw.rect(surface, DARK_GRAY, self.file_menu_rect)
        file_text = font_small.render("File", True, WHITE)
        file_text_rect = file_text.get_rect(center=self.file_menu_rect.center)
        surface.blit(file_text, file_text_rect)

        pygame.draw.rect(surface, DARK_GRAY, self.edit_menu_rect)
        edit_text = font_small.render("Edit", True, WHITE)
        edit_text_rect = edit_text.get_rect(center=self.edit_menu_rect.center)
        surface.blit(edit_text, edit_text_rect)

        # Draw content text
        content_surface = font_small.render(self.content_text, True, BLACK)
        content_rect = content_surface.get_rect(topleft=(self.content_rect.x + 10, self.content_rect.y + 10))
        surface.blit(content_surface, content_rect)

        # Draw active menu message
        if self.active_menu_message:
            message_surface = font_small.render(self.active_menu_message, True, BLACK)
            message_rect = message_surface.get_rect(topleft=(self.content_rect.x + 10, self.content_rect.y + 40))
            surface.blit(message_surface, message_rect)

    def handle_click(self, pos):
        if self.file_menu_rect.collidepoint(pos):
            self.active_menu_message = "File menu clicked!"
            return True
        elif self.edit_menu_rect.collidepoint(pos):
            self.active_menu_message = "Edit menu clicked!"
            return True
        return False

# Create our two custom windows
window1 = Window(100, 100, 350, 250, LIGHT_BLUE, "My Documents", "Welcome to Window One!")
window2 = Window(500, 350, 400, 280, LIGHT_GREEN, "Application", "This is Window Two.")

# --- Game Loop ---
running = True
while running:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            running = False
        elif event.type == pygame.MOUSEBUTTONDOWN:
            mouse_pos = event.pos

            # Handle Start button click
            if start_button_rect.collidepoint(mouse_pos):
                start_menu_active = not start_menu_active # Toggle menu visibility
            elif start_menu_active and start_menu_rect.collidepoint(mouse_pos):
                # Check if a start menu item was clicked
                for i, item_rect in enumerate(start_menu_item_rects):
                    if item_rect.collidepoint(mouse_pos):
                        # In a real app, you'd launch something here
                        print(f"Launched: {start_menu_items[i]}")
                        window1.content_text = f"Launched: {start_menu_items[i]}"
                        start_menu_active = False # Close menu after selection
            else: # If click outside start menu, close it
                start_menu_active = False

            # Handle clicks on window menus
            window1.active_menu_message = "" # Clear previous messages
            window2.active_menu_message = ""
            if window1.handle_click(mouse_pos):
                pass # Handled by window object
            elif window2.handle_click(mouse_pos):
                pass # Handled by window object

    # --- Drawing ---
    screen.fill(desktop_bg_color) # Desktop background

    # Draw Taskbar/Top Bar
    pygame.draw.rect(screen, TOOLBAR_GRAY, taskbar_rect)
    pygame.draw.rect(screen, BLACK, taskbar_rect, 1) # Border

    # Draw Start button
    pygame.draw.rect(screen, DARK_GRAY, start_button_rect)
    pygame.draw.rect(screen, BLACK, start_button_rect, 1)
    start_text_surface = font_medium.render(start_button_text, True, WHITE)
    start_text_rect = start_text_surface.get_rect(center=start_button_rect.center)
    screen.blit(start_text_surface, start_text_rect)

    # Draw Start Menu if active
    if start_menu_active:
        pygame.draw.rect(screen, TOOLBAR_GRAY, start_menu_rect)
        pygame.draw.rect(screen, BLACK, start_menu_rect, 2)
        start_menu_item_rects = [] # Clear and re-populate for current frame
        for i, item in enumerate(start_menu_items):
            item_y = start_menu_rect.y + 10 + i * 30
            item_rect = pygame.Rect(start_menu_rect.x + 5, item_y, start_menu_rect.width - 10, 25)
            start_menu_item_rects.append(item_rect)

            # Check for hover effect (optional but nice for menus)
            if item_rect.collidepoint(pygame.mouse.get_pos()):
                pygame.draw.rect(screen, BUTTON_HOVER, item_rect)

            item_text_surface = font_small.render(item, True, WHITE)
            item_text_rect = item_text_surface.get_rect(topleft=(item_rect.x + 5, item_rect.y + 2))
            screen.blit(item_text_surface, item_text_rect)

    # Draw Windows
    window1.draw(screen)
    window2.draw(screen)

    # --- Update the Display ---
    pygame.display.flip()

# --- Quit Pygame ---
pygame.quit()
sys.exit()

July 02, 2025

VLA models for reproducing motion capture trajectories

Over decades, an important but unsolved problem was available in robotics: How to rpreoduce motion capture demonstration? The initial situation was, that a teleoperated robot was able to pick&place objects in a kitchen, all the data were recorded with a computer, but the replay of these data wasn't working. The reason is, that if the same motor movements are submitted during the replay step to the robot, these motor movements will result into chaotic behavior, because the objects are in different position, and new obstacle might be there not available during motion demonstration.

The inability to replay recorded movement prevented to develop more advanced robots and it was a major criticism against motion capture and teleoperation in general. Some attempts like Kinesthetic Teaching were used Robotics to overcome the bottleneck including preprogramming of keyframes, but these minor improvement didn't solved the underlying problem.

A possible answer to the replay problem in motion capture are Vision Language action models which should be explained briefly. The idea is to create an additional layer which is formulated in natural language. A neural network converts the mocap recording into natural language and then action are generated for the perceived symbols. The natural language layer increases the robustness and it allows to fix possible errors in the motion planner. The AI engineer can see in the textual logfile, why the robot has failed in a certain task. For example, a certain object was labeled wrong, or the motion planner has generated a noise trajectory. These detail problems can be fixed within the existing pipelines.

Vision language action models (short VLA model) are solving the symöbol grounding problem. They are translating low level sensory perception into high level natural language. The resulting symbolic statespace has the same syntax like a text adventure and can be solved with existing PDDL like planners. Let me give an example for a longer planning horizon.

Suppose a robot should clean up a kitchen. At first, the needed steps on a high level layer are generated, e.g. removing the objects from the table, transport the objects into the drawer, cleaning the table, cleaning the floor. These abstract steps are formulated in words, similar to a plan in a text adventure. In the next step the high level actions are translated into low level servo commands. The servo commands are submitted to the robot which cleans up the kitchen.

The single cause of failure is the translation between the high level and the low level layyer. The robot needs to convert sensory perception into language, and language into motor actions. A VLA model implements such a translation.