< 3a1f Reward maximization in a production line
Similar to most AI domains, the game of tetris is an np hard optimization problem, which means, that a desktop PC will need to process the information for over 1 million years, until the next action can be determined ... To overcome the bottleneck a heuristic is needed. To be more precisely, it is about a heuristic cost function or short “a cost function” which is used to play Tetris by an AI.
Such an attempt to solve the game might sound a bit surprising, because most programmers assume that the game of Tetris provides a cost function as default and there is no need to define the rewards. The problem with the built in default cost function is, that it works with a delay. The player will recognize that he has lost if it is too late. So what is needed is a modified cost function which provides a continuous very precise feedback signal if a certain action makes sense or not.
In the game of Tetris, the human player can take two decision: at which position the next block is placed and at which rotation. The interesting situation is, that it is possible to judge about the decision even before the block has fallen downwards. This allows to determine a score before the official score was determined.
On base of this cost score, a receding horizon planner can determine the optimal action. What the planner is doing is to maximize the reward. The open question is, what costs exactly should be calculated for a certain game situation? This depends from the heuristics. It is possible to invent a simplified cost model or a more advanced scoring system. A slightly accurate cost function allows a desktop PC to play the game of tetris with super human level performance without occupying too much CPU ressources.
No comments:
Post a Comment