June 26, 2019

Introduction into Learning from demonstration


I would like to introduce the topic with a small example, suppose an Artificial Intelligence should play Lemmings, because, quote “The game of Lemmings has been offered as a new Drosophila for AI research” [1] The naive approach would be to understand the Lemmings game as some kind of search problem in which the solver has to find the actions for winning the game. Because the state space is very large, this attempt will fail. To make things a bit shorter, the better alternative is to invent a plan language which will guide to search process. But let us go into the details.
A simple level of Lemmings in shown in the figure.

The entrance is on top in the middle and the lemmings must master the falling step, then a stopper has to be set on the right side, and the wall needs a digger before all the lemmings are allowed to go into the exit at the left bottom. The arrows in the map demonstrate what the solution is. The interesting point is, that these markers are equal to the walkthrough tutorial, that means, the level is already solved and the Lemmings has to follow only the guidance. Exactly this aspect is typical for learning from demonstration. Before the software gets started the walkthrough is available, which is formalized as a plan.
Learning from demonstration means in it's core to formalize a plan language. If the plan language exists a solver can calculate the subactions. Now let us imagine what will happen in a different Lemmings map. No plan is available and the Lemmings doesn't know what to do. The funny thing is, that the Artificial Intelligence is no longer forced to control the game in all details, but the only missing thing is a plan. The overall pipeline contains of two items: finding a plan for a map, using a plan for control the Lemmings.
The plan source can be human but it can be also a pathplanner which is working on a abstract level. Even if the human provides the plan, the system will work a bit autonomous. Because the human has to draw only the plan into the map and the system is doing the rest. From a certain standpoint this can be called cheating, because the AI only follows the walkthrough tutorial which is already there.
And exactly here comes “Learning from demonstration” into the game. The main principle is to invent a plan notation for a domain. This plan notation is used for recording and playback of demonstrations. The sad news is, that no standard is available for a plan language. It can be a graphical notation, a textbased langauge or a trajectory which is produced by Dynamic movement primitives. What is important to know, is that LfD is located between teleoperated robot and autonomous robot. The link between both consists of an abstract plan language.
Let us observe what a Lemmings AI is doing if the plan is known. The input for the system is the figure which contains the arrows. The plan is stored internally as a waypoint list which has some smaller annotation. For example the first step in the level is annotated with “parachute” while the second step is annotated with “stopper”. The AI Solver takes this plan and converts it into low level actions. It has to make sure that the plan is fulfilled. This is equal to a subgoal. For the solver, a subproblem is given which is only some seconds long and in which the general idea is provided by the plan. And the solver has to figure out only the detail adjustments to the plan. This can be done on a standard PC without much effort.
The funny thing is, that the same principle can be transfered to any game. No matter if it's called Sokoban, grasping robot, Lemmings or RC-Car control, in all these domains a plan notation is used as an intermediate between walkthrough-knowledge which is provided by humans and lowlevel solver which executes a plan.
Human guided teleoperation
The term “learning from demonstration” is a bit misleading because the association is, that some sort of machine learning algorithm takes place. The more exact terminology is to call such systems human guided teleoperation. Teleoperation means, that in the basic setup a human operator controls the robot with a joystick. Plan guided teleoperation is equal to replace the joystick with a plan notation which allows to control the system from an abstract level. In both cases the knowledge how to solve a robot problem comes from the outside. Either by direct human commands, or from an abstract plan provided also by a human.
The overall system doesn't have much in common with a classical AI planner, but it's more a human-to-robot interface. The plan notation is similar to a joystick an input device which transmits the human knowledge into machine readable information.
[1] Kendall, Graham, and Kristian Spoerer. "Scripting the game of Lemmings with a genetic algorithm." Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753). Vol. 1. IEEE, 2004.