There was a paradigm shift available how AI researchers have discussed about the shared goal of building intelligent robots. Until the year 2010 the untold assumption was to program a closed system. The robot was seen as a machinery which consists of software, hardware and algorithm and the goal was to optimize this machinery. For example to create more advanced grippers or improve a path planning algorithm. It was assumed that this was the only way to think about robotics, because the goal was to build autonomous self sufficient systems which was seen as equal to artificial intelligence.
After the year 2010 there was a different approach available which has started with bottom up robotics invented by Rodney Brooks and has evolved into modern Vision language action models, see the right figure. The idea is to use teleoperation between a robot and an external instance which can be a computer program, a human or a large language model. Such kind of distributed AI generates a new problem. Instead of discussing how a robot is working internally, for example with an algorithm, the new question is how to design the ocmmunication between the robot and the external instance.
This simple modification has created a very different bias in Artificial intelligence. Former autonomous and closed systems are rejected in favor of a natural language communication preference. An early example for open systems in robotics was the Shrdlu project, later more complex attempts were the Poeticon++ dataset and the Rocco Robocup commentator. These early attempts were not using advanced LLMs but they have anticipated a speaker to hearer communication pipeline.
Classical AI until the year 2010 was limited by the np hard challenge. A certain motion planning algorithm needs a large amount of CPU resources. Planning the steps for a complex robot task e.g. biped walking and grasping objects was beyond the capabilities of computer hardware. Even with highly optimized programming language and advanced model predictive control algorithms, this np hard bottleneck can't be solved.
In recent AI after the year 2010 the np hard problem can be ignored because there is no need for motion planning algorithms anymore. The robot gets its instruction from an external instance. And this external instance can generate a trajectory much easier than the robot itself. What is available instead is the problem how to program a text parser. If the external instance gives the command "move to left corner in the maze" this command needs to be translated into action by the robot. For doing so, a dedicated parser is needed which can be implemented as context free grammar, as large language model or a handcoded computer program. This parser is the new limitation in robotics.

No comments:
Post a Comment