September 27, 2022

From a referee based evaluation to parametric cost maps

 A referee is a instance in sports game which evaluates the behavior of the players. The referee isn't part of the game but his task is to judge about how well the player performs in the game. A typical situation is, that the referee decides if the ball has entered the goal.

The limit of a referee is, that his role is static. A referee for soccer is judging with a different ruleset, than the referee for basketball. In one of the game it is allowed to touch the ball with the hand while in the other not. In a more flexible general game, the referee is allowed to modify the rules during the game. In such a hypothetical game the referee is allowed to change the game. He can advice the same players to behave like soccer players, basketball participent or any other sports game.

Such flexible role switch doesn't make sense for real sports game but it might be interesting in the robotics domain. The idea is, that a robot isn't playin a certain game, but the robot has to obey to the referee. And the referee can say that the ball should kicked in the left goal or into the right goal. And which action is correct doesn't depend on the game but on the decision of the referee.

In such a use case, the referee has much more power than in a normal sports game. So it is no longer a classical sports game with fixed rules, but it is some sort of training game in which the coach imagines the rules and the player has to fulfill them.

From a technical perspective such interaction can be modeled with a parametric cost map. The objective for the critic is to provide a cost map which can be adapted while the objective for the actor is to minimize the costs in that map. The natural commands formulated by the coach are translated into a grounded cost map and the cost map judges if the actions of the player are right or wrong. Such universal game is only limited by the ability to imagine a cost map. And the ability of the actor to parse the cost map.

The term actor critic was introduced already and it describes very well the division of tasks. One instance imagines what the cost map is, while the other instance has to provide concrete actions. It is important to know that both instances are needed so the artificial Intelligence can't be realized with a single module but at least two different modules are needed. The easier to realize module is for sure the actor instance. A computer program which gets a cost map as input is a normal numerical solver. The objective function is provided as input and the task is to search for a node in the game tree which minimizes the costs. A concrete algorithm would be a depth first search algorithm which is described in most entry level computer science text books.

The more complicated problem is located in the coach instance. The critic has to invent a cost function in a way, that a computer can understand it. For example a command like "kick the ball into a goal" is too vaque. The average computer doesn't know what kick, ball and goal is about. So a more precise constraints formulation is needed.