February 20, 2022

3a1e Improved production line robot

 < 3a1d Programming an assembly line robot



In addition to the previous simulation game, the GUI was improved a lot. The robot has to sort the tokens but this time the task is more complicated. Also the performance evaluation was improved, so the robot needs to adjust it's movements.
One thing remains the same. The robot isn't controlled by an AI, but the game is played by human intervention. That means the operator has to press the arrow keys and this will trigger the actions of the robot. The AI is located in the referee, that means, the box with the performance information on the left lower side is generated by software on the fly.
For example, if the robot worker has placed the token into the wrong bin, his reward is reduced by -4. Also if the robot keeps on walking even the energy level is low, a negative reward is the result. Why this performance evaluation is needed is because of grounding reasons. The task of navigating a robot in a factory is translated into a score. That means it is not important what exactly the robot is doing but the only thing who cares is the measured reward.
At the end of the game it is pretty easy to judge what the robot has done. For example he has executed 16 picks, has reached an overall score of 11 and it took a time of 92 seconds. All the quality criteria are available as numerical integer values, that means, they can be stored in a computer program very well. In contrast, the original domain which has to do with sorting tokens by color and place them to the correct position are hard or even not possible to understand by a computer. So we can say, that the shown game is an example for a grounded Artificial Intelligence.
 
Let us take a look into the game itself. The incoming line on the left side delivers new tokens in random order. The task for the robot work is to sort them by putting them on the outgoing lines on the right side. So it is some sort of pick&place task.
The interesting situation is, that the robot worker isn't controlled by a sophisticated Artificial Intelligence but it is working in the teloperated mode. That means, the shown simulation is a normal computer game. After pressing the keys the robot is doing something. The new and advanced element is, that the virtual referee determines very precisely if the action is making sense or not. That means, every action of the robot is tracked, monitored and translated into a reward score. The robot worker is under total surveillance and has to explain itself for everything. So it is a highly accurate scoring system to determine the performance of the robot.
On the first look such a game doesn't look very pleasant, because human workers doesn't like the idea to be monitored. But from the perspective of Artificial Intelligence it makes a lot of sense. Because such a domain is a great testbed for an optimal control algorithm. The domain has possible actions (up,down, left, right), and the domain provides a feedback stored in the total reward. The feedback is a numerical value and the task for the model predictive control algorithm is pretty easy. The goal is to maximize the reward. This is equal to win the game.
Here the pipeline in short:
Teleoperation -> simulation -> features -> cost function
 
Playing the game with an algorithm
How humans are playing such games is easy. They are looking at the monitor and decide which actions should be done next. With a bit of training a human operator will reach a better performance and for sure he will make some minor mistakes after repeating the actions over and over again.
The more interesting question is how will an optimal control algorithm play the game. The algorithm doesn't have any sort of human intelligence so he has to focus on the information box on the left lower side. The box contains the variables: elapsed time, energy, picks, totalward reward.
From a computer perspective these variables are stored in an integer array which contains of 4 elements. In addition the computer will need another variable to store possible actions for the robot (0=left, 1=up, and so on). The interesting thing is, that with this minimalist information a computer is able to play the game. There are many existing algorithm available to determine the optimal action sequence. Most of them are working with graph search which is improved by the reward information.
 
The inner working of the AI
In the concrete example it is possible to explain who the AI is working. The normal conception is that the robot is doing something and the robot should be determine by it's own intelligence which action is the best one. So the question is how exactly does the AI know, that the the robot has to go the charging station or place the token to the correct place?
The surprising answer is, that the robot doesn't know the answer. What is available instead is a virtual referee. The referee determines the score during the game. He determines if the action was good or not. Such a referee is used in most video games to determine the collision of a player with walls. Here in the production line simulation the referee is more advanced and determines many other details. The interesting situation is that the referee can judge about human players and AI Controlled robots itself. So the new understanding is, that the robot doesn't need an onboard AI, but the game needs a virtual referee.
This referee allows to ground a game. Grounding means to convert the pixel map which is 320x200 into a small list of variables shown on the lower left. These small amount of variables are used by a solver to play this game automatically.