July 20, 2019

Lessons learned from implementing RRT for Sokoban


RRT (Rapidly-exploring random tree) is sometimes presented as the preferred path planning algorithm which works well for larger maps and for real time purposes. Implementing the algorithm in software is not that hard. It's mainly a class which contains of methods like “comparewithgoal”, “addnode”, “getparentid”, and “showpath”. If the user starts the algorithm the software will generate a game tree and adds new nodes carefully. This is equal to a fast sampling of the state space.
For smaller problems it works great. A small problem is a map of 20x20 fields, in which the robot is at top left and has to find the path to a goal and additionally some obstacles are in the map. Running the RRT algorithm for such kind of problems results into a performance of around 1 seconds which is fast enough. If the programming language is carefully selected for example C++, the speed is much better and it can be used under realtime constraints.
The problem begins if we are modify the setup slightly. In the robocode domain, the robot has not only move to a goal he has to push a box. The goal function has to check if the box has reached the goal. If the RRT algorithm is used for such kind of purposes the performance will become worse. On my computer it takes many minutes until a solution was found. A simple request to the solver like “put the box to goal position A” will result into a larger processing task which occupies lots of CPU ressources and will take 2 minutes. A more complicated task in which two boxes should be moved to a goal position results into an runtime of many hours which means, that the RRT algorithm doesn't find a solution.
The reason why has to do with the state space. If the robot has to move to the box, he will need around 10 steps, for moving the box to a goal he needs additional 15 steps, then he needs again 10 steps to move to box 2 and so on. The amount of possible actions in this longer sequence is high, and the resulting graph will grow quickly which is too much for the RRT algorithm.
The problem is, that it's not possible to improve the performance a bit, because for solving the box pushing task the algorithm has to run 1000x faster. RRT is not the answer to the problem, it prevents that the solver will find a solution. Sure, RRT is one of the fastest sampling algorithm available, but state space sampling isn't able to solve complex problems. I think the problem is located within planning itself. The idea of providing a start position and the solver has to find the goal position isn't working anymore. Only for simplify toy problems the algorithm will find the steps in between.
The good news is, that this understanding not only critizes the RRT algorithm, but similar techniques like PRM, A* and reinforcement learning will have the same poor performance. The only way to overcome the difficulty is to switch to a different kind of search strategy which doesn't sampling the state space.
Alternatives
Possible alternatives to RRT which are working faster are easy to tell. The first one is a quadtree path planner, which means, that the map is divided into submaps which have different sizes. This allows to plan longer routes with a single step. The second technique is symbolic planning in which the problem is converted into a STRIPS domain file which is equal to hierarchical actions. In both cases, the original problem is ignored, the normal state space is no longer relevant but a different layer is built on top of the original game.