June 23, 2019

The essence of Learning from demonstration


In the literature Learning from demonstration is not precisely defined. Instead a mixture of Dynamic movement primitives, Reinforcement learning and direct manipulation of robotgrippers are presented under this term. The first step is the define what LfD means at it's core. The basic idea can be summarized as “plan following”. A plan is feed into the system in a high level language. One option (but not the only one) is to create a plan by teaching. That means, the human moves the robot gripper to a goal. But the plan can created with a textinterface as well.
Let us go a step backward and start with a system which is more easier to describe. A teleoperated robot lacks of any kind of AI. Instead the system is controlled with a joystick but it can't do something autonomously. To improve the system, a plan formalization is needed. That is a plan language and a concrete plan in that language. All learning from demonstration systems are based on a plan. Possible way in doing so are natural language vocabulary, waypoint trajectories, photographed keyframes or a function in the state space. The idea of a plan is to reduce the state space, it explains to the robot what to do next.
Learning from demonstration means usually to track a plan, to think about the plan language, to replay a given plan autonomously and to annotate a demonstration with plan elements. The most simple form of a plan notation is a waypoint list for example (100,100), (100,150), (200,100). The plan is equal to an abstract description for solving a task. That means, the AI isn't able to find the solution by it's own, instead the robot has a walkthrough tutorial and executes it.
The open question is, how exactly a plan language should be. In the given example, the plan language is a simple point list. But more demanding tasks like grasping will need a more elaborated formalization which goes into the direction of a domain specific language. Somebody may argue, that a plan is fixed and isn't flexible enough. That is correct, Learning from demonstration is restricted to a concrete domain. If the situation is changing the plan will become useless. But that is not a problem, because the idea is that a robot can do a narrow AI task like open a bottle which is always the same.
Plan recognition for a line following robot
Most line following robot competitions are created with autonomy as a goal. The idea is, that after pressing the run button, the robot will drive by it's own along the line on the ground. The more interesting way in fulfill the challenge is to let a human operator control the robot and track if he is able to follow the line. That means, there is a line given and the operator has to move along the line. Does this has to do with robotics at all? Yes, because it's a plan recognition challenge. The plan is given by the line and the system has to track if the human operator fulfills the plan or not.
Such a systems starts with the assumption, that no robot control system at all is available but the teleoperation mode is the only available technology. On top of a working teleoperation controller an activity tracker / plan recognition system is put on top with the aim to improve the overall software. The idea is, that the transition from a teleoperated system into a fully autonomous one contains of many steps in between and the quest is to explore them slowly.
From a formal persective the line of the ground is equal to a lan. It explains to the robot and to the human what the goal is. The plan is equal to a 2d trajectory. It's a spline which goes through different waypoints. The robot can move on the line of outside the line.