Robotics and Artificial Intelligence: Plan based robot control

The classical understanding for robot builders is to focus on the software which controls the robot. The idea is to develop a so called AI which is able to drive the machine by it's own. What happens in reality is, that these AI systems are usually broken. The user press the start button, but the robot is not able to follow the fine on the ground and the assumption is that something is wrong with the software. But the real problem is located somewhere else. The idea of creating an AI itself is the problem. In the following blogpost I'd like to describe an alternative development method in which the plan notation stands in the middle.

Suppose the aim is to drive an RC Car in a parkour under the constraint to not collide with other cars and respect the normal traffic rules. The best practice method for building such a system is start with a normal teleoperated robot. That means, a human driver is in control and he has to accelerate and steer the car. Such a human control rc-car will drive with nearly 100% accuracy. That means, the task is fulfilled. What is different from normal RC cars is, that all the actions are recorded. A motion tracking system records the car, the other car, the actions of the human. On top of the motion recording additional features are calculated for example the distance to another car, or the direction the car moves.

The next step is to build a game-log parser. That is an engine which takes the input data and generates a symbolic plan description. That means, it converts lowlevel information into high level description. For example it prints out, that an obstacle car is in front of the rc-car. Or that the human driver stops because of the red traffic light. Plan recognition is equal to sense making. The game logs are interpreted sematnically. The interesting point is, that for classical robot engineers such a plan recognition system is not important and in most cases it's not available. The recommendation is to give this software element a much greater priority. It's important to know, that a plan recognizer is not able to control the car. Even the software works great it won't replace the human user. It will only track his actions in the current situation.

The funny thing is, that plan recognition and model tracking is the precondition for constructing any sort of robot control system. If no plan recognizer is available, it makes no sense to talk about a control system. The idea is to invent some sort of plan notation language. Such a language is described with a grammar / ontology similar to a domain specific language. It contains of functions (stop, speed slower, steer left) and objects (own car, other car, crossing, traffic light). If the human operator starts a new circuit the plan notation formalizes the situation. It is a handy way in storing motion capture data to the harddisk.

A language for describing robot plans is sometimes called an interface. Which means, it can't control a robot but is a communication device for human robot interaction. Sure, there are many robot languages described in the literature. The only problem is, that the amount of languages is too low. That means, the concept of a language is the right idea, and only details improvements are necessary to invent the perfect robot language.

In the literature the concept is described under the keyword “learning from demonstration”, but it's important to know that such system are not able to learn and that the system can not repeat the action by itself. The more interesting part of LfD is the plan notation. Before Learning from demonstration can be realized a plan language has to invented first which converts the raw mocap demonstration into symbolic plans. This kind of interface is the interesting part of the overall architecture. Such a system can be imagined similar to a computer language parser. It's based on a context free grammar and a formal domain description. The human user is allowed to do some tasks and the parser is able to identify these tasks. Constructing the parser is more complicated than building a C++ parser, that's the reason why the amount of literature is low.´

A mocap-to-plan parser: As input the system takes the raw data of a camera and as output it prints to the screen what the robot is doing.[1]

Trajectory encoding

A more general description what Learning from demonstration is was given in the paper [2]. On the lower level it is a trajectory encoding language and on the higher level a symbolic plan encoding which is equal to the pddl syntax.[2] The term learning shouldn't be interpreted as machine learning, instead the idea is that a plan notation language is available which acts as an interface. The aims is, that during execution of a task by a human operator, the AI system is able to recognize the actions and match it with the plan language.

There is a bit confusion what LfD really is. In most projects around the topic, the demonstration phase is extended with an autonomous robot who is doing the task alone. But the ability to replay an action is not the most important one. Sure, at the long hand the aim is to program a robot, but this feature can be ignored as a minor problem. The more important aspect is, that the plan notatation and the plan recognizer is working. Which means, a normal teleoperated robot can be realized as a learning from demonstration framework.

Let us analyze the precondition for so called robot teaching. Before the human operator can guide the robot arm to do an action, a task model is needed. That is a plan description language who converts lowlevel actions into semantic descriptions. The goal is not, that the human operator moves the robot arm, the goal is to invent a robot planning language which can store these movements. It's correct to say, that LfD is the same as “game log recording”.

Another synonym is trace annotation.[3] The actions on the screen are recorded to a log file and the logfile is extended by additional information. Never the less, the term “trace annotation” isn't used very frequently in the literature, the more common keyword is “Learning from demonstration”.

The amount of techniques for realizing a plan notation is huge. In the simple case, it is done with a path notation. That means, the trajectory is stored as a list of waypoint. More sophisticated forms of notation are PDDL, grammar based domain specific languages, ontologies and for low level trajectory the Dynamic movement primitive concept is often referenced. All these techniques have in common that a plan which is executed by a human is stored on the harddrive. It converts the pure mocap information into a semantic description. Such a parser is an addon to an existing teleoperated robot. That means, the robot is under control of a human, and in the background the parser is tracking what the human is doing.

The importance of such a plan interface can't be exaggerated. It can be used either for analyzing the performance of humans but it's also the starting point to construct on top of the plan notation an automatic solver who can replay the action by it's own.

Trajectory annotation

A teleoperated robot will produce in most cases a trajectory. A car-like robot will drive on the 2d state space while a robot arm will generate also a trajectory. Recording the raw data is a normal task which can be fulfilled within the well known principle. The more demanding problem is, to annotate the given trajectory. Annotation is equal to enrich the raw data with additional information. The GPS trajectory of a car alone makes no sense, but if the trajectory is drawn into a map, and if important waypoints are marked in the trajectory it will provide a lot of information.

Unfortunately trajectory annotation is not defined precisely, there are lots of possibilities in doing so. In the example of a car, the trajectory results into a speed profile which is equal to the speed over the time axis and different segments can be identified, for example:[4]

- car exits highway

- loop

- left turn

- traffic light

Like i mentioned before, an annotation software is not able to drive the car by it's own. The trajectory is the result of a human operator how drives the car manual. The trajectory parser is able to explain the human behavior. It identifies a segment in the trajectory and adds the information that the car was in a traffic light situation which results into a certain profile of speed, steering wheel and obstacle detection back and forth. A well annotated trajectory can be stored in a database for further investigation. For example it is possible to search for all situation “traffic light situations”. The process of trajectory annotation is equal to model building, that means the driving patterns are interpreted in an abstract description. This description contains of words and parameters, and is equal to a plan notation.

Sources

[1] Yang, Yezhou, et al. "Robot learning manipulation action plans by" Watching" unconstrained videos from the world wide web." Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

[2] Cubek, Richard, and Wolfgang Ertel. "Learning and application of high-level concepts with conceptual spaces and pddl." PAL 2011 3rd Workshop on Planning and Learning. 2011.

[3] Mehta, Manish, et al. "Authoring behaviors for games using learning from demonstration." Proceedings of the Workshop on Case-Based Reasoning for Computer Games, 8th International Conference on Case-Based Reasoning (ICCBR 2009), L. Lamontagne and PG Calero, Eds. AAAI Press, Menlo Park, California, USA. 2009.

[4] Moosavi, Sobhan, et al. "Annotation of car trajectories based on driving patterns." arXiv preprint arXiv:1705.05219 (2017).