February 20, 2022

3a1f Reward maximization in a production line

 < 3a1e Improved production line robot



The example with the production line game was introduced in a previous post. Even if a human operator can interact with the game, the more exciting task is to program a robot which can solve the game. Because the game is already grounded with an elaborated reward function such a robot solver is pretty easy to realize. For the concrete game, only 30 lines of code in the python language were enough to create such a robot.
What the machine is doing after it gets started is simple. It will maximize the reward. That means, the solver has a clear understanding about the goal of the game, and it can plan the optimal action sequence. The resulting performance is much higher than what a human operator has to offer. Not a single mistake was made and the amount of picks per seconds is high.
The reward is growing constantly over the time, also the robot is visiting regularly the charging station. This sense making behavior is not the result of the onboard AI itself, but it has to do with the reward function. The reward function defines which actions are good and and which not and this guides the search in the state space.
The perhaps most surprising insight was that not the robot is intelligent, but the game has this built in feature. The opposite is a game which has no reward function. Without a reward it is not possible to play this game automatically.