Robotics and Artificial Intelligence: From heuristics to language guided planning

Heuristics are a well known problem solving technique in the history of Artificial Intelligence. Pathplanning heuristics like A* and heuristic evaluation functions for computer chess are commonly used since the 1980s and are well documented in the literature. The main advantage over a vanilla sampling based planner is the better performance which allows to solve more complex problems.

Unfortunately, classical heuristic algorithms are not able to solve robotics problems like motion planning. Until 2010, it was common to apply heuristic planning algorithms like A*, RRT and potential field method to path planning in robotics, but the success was low. Either the runtime of the algorithm was slow or the generated trajectory had a low quality. That means, the robot was able to avoid the obstacle, but the path was clumsy.

An broad accepted definition for a heuristic algorithm is to assume that its equal to a heuristic cost function. Such a function assigns the current state a value from 00 to 1.0 which determines how well the state fulfills the goal requirements. If the robot is direct on the goal or near the goal, the costs are low and if the robot is far ahead the costs are growing. Such a metric allows to guide the search in the state space for a longer horizon and it can be used as a metric for model predictive control.

Unfortunately, a cost function alone can't solve complex robot planning problems, but it provides only a hint that heuristics are a here to stay. The question which was researched by the AI Community since 2010 was how to improve the cost function to a more advanced description of the current state.

The logical next step after heuristic algorithms and dedicated cost function is a natural language description of the current scene which is also known as a visual question answering problem. Instead of simply a cost value to the scene, the idea is, that a human operator can formulate information gathering requests like "what is the current direction?" "how far is the goal?" "Is there an obstacle" which are answered by the robot. Such a natural language engine can capture more expert knowledge about a subject and allows to solve more complex planning problems.

An early form of semantic description was used for formulating cost functions in the past. A cost function consists usually of features. In case of computer chess possible features are the amount of black figures or the position of the pawns. These features are hard coded into the cost function to determine the current cost value. In contrast, a modern VQA engine allows to formulate the features interactively. The human operator can ask the system for certain aspects of the game which makes the cost function more efficient. Also it allows non programmers to modify the cost function which was not possible in chess engines in the 1990s years.

Robotics and Artificial Intelligence

February 11, 2025

From heuristics to language guided planning

No comments:

Post a Comment