Robotics and Artificial Intelligence: GOAP planning for Sokoban

July 29, 2019

GOAP planning for Sokoban

The game of Sokoban consists of a robot, a box and a goal. The robot has to execute some actions which pushes the box to the goal. Instead of creating the overall gametree the better idea is to describe the problem from an abstract perspective. In the GOAP (goal oriented action planning) technique the robot has the following actions:

- movetobox (precondition: robot far from box, effect: robot is at box)

- adjustposatbox (precondition: robot is at box, effect: robot has push position)

- pushbox (precondition: robot has push position, effect: box is at goal)

The planner can generate an action sequence. A sample plan would look like: movetobox, adjustposatbox, pushbox.

This is the basic idea behind GOAP. But perhaps it's possible to somplify the overall pipeline with conditional planning. Conditional planning means to create a more abstract plan which works in different situations.

The main idea behind the GOAP problem solving technique is to transfer a domain into a language. In the Sokoban example, three actions are available plus some world descriptions like “robot is at box”. This new kind of problem is a textadventure, it is played interactively and the visual representation is no longer needed. A pleasant sideeffect is, that the state space was reduced dramatically. Even the Sokoban map has a size of 100x100 fields, the planning space for the solver is reduced to the textadventure.

The open question is, if the problem space can be reduced further. A possible idea would to a use constructive grammar which produces the textadventure. In the literature this is called "procedural grammar". A GOAP model has the syntax:

rulename, if feature then feature

Rule induction

If the GOAP action model is not known in advance it make sense to collect a dataset and try to determine the rules with machine learning algorithm like Decision tree learning algorithm C4.5. At first, we need a vast amount of data:

case id	name	precondition	effect
0	movetobox	robot far from box	robot is at box
1	movetobox	robot far from box	robot is at box
2	adjustposatbox	robot is at box	robot has push position
3	adjustposatbox	robot is at box	robot has push position
4	pushbox	robot has push position	box is at goal
5	pushbox	robot has push position	box is at goal

In the table some cases are recorded. They were acquired from plan traces in a learning from demonstration scenario. To make thinks simple, the precondition / effect variables are the same, that means every action was successful. The idea is to first record all the interaction with the Sokoban game, extract features, group these features into actions and then give the actions a name. The result is GOAP forward model which predicts the outcome of an action in terms of feature values.

Robotics and Artificial Intelligence

July 29, 2019

GOAP planning for Sokoban

Rule induction

No comments:

Post a Comment