March 01, 2020

Learning from demonstration with Karel the robot

The programming game “Karel the robot” is a runtime engine to execute short scripts. The user can type in a list of commands the the robot will behave according to the statements. A sample program contains of:

right;

forward;

forward;

stop;

The dominant reason why Karel the robot is described in the literature is because it can be used for teaching programming skills. The first task is to program the robot simulator itself, and the second task is to create the script which gets executed in the simulator. Unfortunately, most tutorials doesn't provide further hints how to improve the setup. So it's up to the user to invent an additional challenge on top of the Karel game.

Let us imagine how to combine the technique “learning from demonstration” with “Karel the robot”. Learning from demonstration (LfD) works by defining skills which are stored together with precondition in a database. Learning means, that these skills are generated on the fly by user demonstration. The first task to do is, that the user is operating the Karel robot with a keyboard but not with a program. He is pressing the arrow keys and records a motion over a longer time. Then he demonstrates a second motion and so on.

The idea of LfD is to store the motion recordings in a database.


motion0={
precondition=(100,100),
action=[right, forward, forward, stop]
}

motion1={
precondition=(200,100),
action=[left, forward, stop]
}

motion2={
precondition=(100,200),
action=[forward, right, forward, stop]
}


In the playback phase the karel robot is located at the position (200,100) which fits to motion1 and then the actions from the skill are executed. Sound's pretty easy, isn't it? The idea is not to write a normal computer program which contains of loops and if statements but to create a database with recorded motions. If the database is large enough, every possible situation is provided in advance. The LfD playback engine has to search in the list of all cases and then the motion gets executed. Sometimes, Learning from demonstration is described as similar to case based reasoning because in both cases the database holds the information what to do next.

What is missing in the pipeline is a cost function. A cost function will allow the Karel robot to adapt to new situation more easily. If the robot collides with an obstacle, the cost function will detect it. The result is, that a trajectory from the database can be evaluated if it's useful or not. The planner is searching for a matching case and it's trying to reduce the costs.

A second option to improve the system is to combine create high level skills. Skill1 brings the robot to the middle of the map for different starting positions, while skill2 brings the robot from the middle to the exit of the map. If the script executes a skill not a concrete action is executed, but the skill is a database which contains of possible movements. These cases are searched on the fly.