July 12, 2019

Sokoban as a testbed for plan recognition



The easiest way for teaching Artificial Intelligence is a game in which a task has to be fulfilled. In contrast to a real robotics problem, the position of the player is available by default and no image recognition problem is there. What is not available is the algorithm for playing the game alone. In the screenshot a simplified version of the Sokoban game is shown. Sokoban is more complicated than only follow a line, but it's not too complex to program the game. The underlaying physics engine which is calculating if a box can be pushed or not can be implemented in under 20 lines of code.
What makes the game interesting is, that apart form it's simple GUI the underlying state space is huge. The player can push the boxes in thousands position on the screen. He can first push the first box or do the task in another sequence.
The interesting question is how to solve the game? A naive approach would follow the tictactoe principle which means, that gametree is generated and then a node in the graph is identified. Let us use this strategy in the Sokoban domain. According to the screenshot, both boxes are on the top left and they should be pushed to the bottom right. For doing so the robot can move into four directions. Figuring out the next actions isn't so complicated. Or it is complicated if we are going into the details.
A simple calculation will show, that if we want to plan 8 steps ahead, the total amount of possible actions is 4*4*4*4*4*4*4*4=65536. If we want to bring both boxes to the goal position, a human player will need around 60 steps for executing the task. The resulting gametree contains of 4^60=1.329228e+36 nodes. 10^36 is too much even if somebody would utilize the latest generation of Nvidia graphics card. The insight is, that Sokoban can't be solved with Artificial Intelligence.
To overcome the bottleneck with the exploding state space, something different from AI planning has to be utilized which is called plan recognition. The idea is to invent a plan notation first, track the human actions if he fulfills the plan, and then use the model for hierarchical planning. Let us describe the details. Plan notation means, to define in the Sokoban game a language for human machine communication. The goal of the game is to push the boxes to a certain position on the map. So we need a command like “pushbox(boxid,target)”. We can simplify the syntax by defining two separate skills: “pushbox1tobottomright”, “pushbox2tobottomright”. To fulfill such macro-actions, a helper skill is needed which has to do with the movement of the robot. We are defining addtional skills which are “movetobox1”, “movetobox2”. Now we can formulate the plan in the plan notation:
“movetobox1”
“pushbox1tobottomright”
“movetobox2”
“pushbox2tobottomright”
The next step is to ground the skills. Grounding means, that the gamelog of the sokoban game is annotated with skill names. This is the core idea of plan recognition. To be more specific: if the player moves the robot to box1, then the parser should print out the skill name. To realize such a requirement, the physics engine of sokoban gets extended with new action names. That means, we can send the high level command “Movetobox1” to the physics engine and as a result the absolute position of the robot get's modified.
To make the explanation a bit shorter, the idea is to construct the gametree with the help of the newly defined high level skills. This transforms the Sokoban game into a symbolic game which has a smaller state space.