June 21, 2019

Plan recognition for a kitchen robot


The desired behavior of a kitchen robot is, that the machine can do something useful by it's own. The human has to press only the start button and the robot will cook something. Unfortunately, such an AI system isn't available yet. The overall architecture is complicated and many scientists have failed in building such robots. What can be realized is a weaker form of a kitchen robot, who is teleoperated. Teleoperation means, that the human operator has to cook the meal and he is doing so with a dataglove controlled robot.
From an economical standpoint teloperation is not very productive. The overall workflow will take longer than without the robot in the loop. But it will help to make some AI related topics visible, especially the task of plan recognition. What does that mean? If the human operator will cook the meal he will do some actions for example “grasp the bottle”, “cut the apple with a knife” and so on. Formalizing these actions into a plan description language is a first but important step towards robot autonomy. The sad news is, that a perfect working plan recognition system isn't able to repeat the task by it's own. The human operator has to control the robot with the teleoperation interface. The extra service is, that at the same time the executated plan is made visible on the screen.
The proposed plan description language contains of a plan library which holds the actions: grasp, open, close, ungrasp, cut and so forth. And it contains of subplans, for example “open bottle” means to approach the object, put the finger on the top side and move the closure to the left. That means, a plan contains of hierarchical actions which are taken from a library.
I'd like to describe such a system in action. The first thing what the human operator is doing is to put his hand into the dataglove. This gives him control over the robot hand. If the human operator opens the finger, the robot will do the same. All what the dataglove is doing is to transmit the actions to the robotarm as fast as possible. This allows the operator to manipulate the scene. The second element of the system is a plan recognition system in the background. It checks what the human is doing and matches the actions with the plan description language. The generated plan is similar to the real actions of the robot. The human operator is doing something, and at the same time the textual description is shown on the monitor.
The interesting point is, that more AI related technology is not needed for the moment. The operator can do every task and with a bit luck, the system will recognize the subactions with a parser. The combination of a remote controlled robot arm plus a plan recognizer are useful introductions into the subject of robotics. They will not result into a fully autonomous household robot, but they can bring Artificial Inteligence forward. I belief it's important to identify such low hanging fruits. They are located between a teleoperated system and a fully autonomous robot. Somewhere in between is the demand for AI related research. The goal is to realize the steps in between in software. The fallback mode is always the teleoperated robot. Teleoperation is something which works always. Even if no Artificial Intelligence at all is available, it's possible to control a robot with a joystick or a dataglove. It's similar to playing a computergame the normal way, which means that the human is pressing buttons and moves the mouse.
Artificial Intelligence is everything which goes beyond this minimum requirement. It can be a plan recognition system, a learning from demonstration framework or in the maximum degree it can be equal to an autonomous robot who can handle the task by it's own.
The steps on this path are unknown. And the technology to realize them too. Which means, it's unexplored land and the propability of failure is high. In most cases, an autonomous robot won't work. After starting the system with “run” nothing will happen, because something is wrong with the AI. This is a hint, that a major step from teleoperation to fully autonomy is missing. Engineers have to answer the question which step in between is needed. This missing step explains the reason of failure. I would give a rought outlook how the transition can be described in detail:
1. Teleoperation
1a: plan recognition
1b: hierarchical plan recognition with subactions
1c: learning from demonstration
1d: sketch based goal formulation
1e: plan creation and monitoring
2 Fully autonomous robot
The steps in between are not complete. It's only a general description what the missing steps are. Most failed robot projects can be located on the coordinate system between step 1. teleoperation and step 2 fully autonomy. The overall task is very similar to building a bridge. he left side (teleoperation) is well known. The technology in doing so is available out of the box. A dataglove, a microcontroller and a robothand is sold in most electronics store. The other side of the bridge (the autonomous robot) is not available. It is only a vision, known from movies. The question is what are the steps in between? How to connect the bridge?^
The reason why Teleoperation is equal to the baseline is because it's reproducable. If somebody has made a youtube in which a teleoperated robot manipulator is shown, it's obvious how to build such a system from scratch. It's mostly a hardware problem of connecting the joystick to the robot gripper and then the signals are transmitted over the wire. There is no magic but it's normal engineering.
IN contrast, if somebody has shown a self-working robot who doesn't need teleoperation it's a mystery. Because this technology wasn't invented yet. The engineer has invented something which is new. Such a system may be working or it doesn't work. The details have to figured out and perhaps the robot can't be reproduced.
The reason why there is a difference between teleoperation and fully autonomous robots is because in the first case, the data doesn't contain semantic information. Teleoperation works usually by transmitting raw signals from the input interface to the robotarm. That means, the joystick is pressed upward, and the robotarm is doing the same action. The problem is, that “upward” has no meaning. What is missing is a domain model in which a certain low level action make sense. A normal teleoperated robot doesn't have such a model and this is reason why each action has to controlled by a human in the loop. The human has the overall plan and he knows what the current task is. The challenge is to make parts or even all hidden knowledge of the human visible for the computer. This would allow to improve teleoperation into something better.
JSON Format
planlanguage = { 
"lowlevel action": "left",
"lowlevel action": "right",
"lowlevel action": "up",
"lowlevel action": "down",
"highlevel action": "open gripper",
"highlevel action": "close gripper",
"highlevel action": "walkto",
}
A convenient way for storing the plan language grammar is a json dictionary. Such a datastructure allows to crate a hierarchical string list. In the example only two layers are available and a small number of skills. In contrast to the PDDL format and in contrast to a BNF grammar, the json dictionary can be parsed in most programming languages easily.
The purpose of a json dictionary is, to restrict the allowed dataset which stores the game log. All the actions in the game log belong to the json dictionary. The actions of the human are stored with a predefined format in the log file. Somebody may argue, that a game-logfile and dictionary to parse the file itself are useless and he is right. Because it's not possible to control a robot with such a logfile. The purpose is to define a standard how a plan will look like. The overall system is a teleoperated system which is enhanced by a plan specification language and a logfile.