July 29, 2019

GOAP planning for Sokoban


The game of Sokoban consists of a robot, a box and a goal. The robot has to execute some actions which pushes the box to the goal. Instead of creating the overall gametree the better idea is to describe the problem from an abstract perspective. In the GOAP (goal oriented action planning) technique the robot has the following actions:
- movetobox (precondition: robot far from box, effect: robot is at box)
- adjustposatbox (precondition: robot is at box, effect: robot has push position)
- pushbox (precondition: robot has push position, effect: box is at goal)
The planner can generate an action sequence. A sample plan would look like: movetobox, adjustposatbox, pushbox.
This is the basic idea behind GOAP. But perhaps it's possible to somplify the overall pipeline with conditional planning. Conditional planning means to create a more abstract plan which works in different situations.
The main idea behind the GOAP problem solving technique is to transfer a domain into a language. In the Sokoban example, three actions are available plus some world descriptions like “robot is at box”. This new kind of problem is a textadventure, it is played interactively and the visual representation is no longer needed. A pleasant sideeffect is, that the state space was reduced dramatically. Even the Sokoban map has a size of 100x100 fields, the planning space for the solver is reduced to the textadventure.
The open question is, if the problem space can be reduced further. A possible idea would to a use constructive grammar which produces the textadventure. In the literature this is called "procedural grammar". A GOAP model has the syntax:
rulename, if feature then feature
Rule induction
If the GOAP action model is not known in advance it make sense to collect a dataset and try to determine the rules with machine learning algorithm like Decision tree learning algorithm C4.5. At first, we need a vast amount of data:
case id
name
precondition
effect
0
movetobox
robot far from box
robot is at box
1
movetobox
robot far from box
robot is at box
2
adjustposatbox
robot is at box
robot has push position
3
adjustposatbox
robot is at box
robot has push position
4
pushbox
robot has push position
box is at goal
5
pushbox
robot has push position
box is at goal
In the table some cases are recorded. They were acquired from plan traces in a learning from demonstration scenario. To make thinks simple, the precondition / effect variables are the same, that means every action was successful. The idea is to first record all the interaction with the Sokoban game, extract features, group these features into actions and then give the actions a name. The result is GOAP forward model which predicts the outcome of an action in terms of feature values.