Recently, I've uploaded a new paper about reward functions. https://www.academia.edu/50944920/Formalizing_knowledge_with_an_evaluation_function_for_the_snake_game_and_other_examples
The topic seems to be relevant for robot control. In short the idea is to map measured feature values to a numerical reward value, and then use this value for controlling the robot towards the highest reward. So it is some sort of layer between the robot and the game it is playing.
What the paper doesn't answers is how to create reward functions. There are two opposite approaches available, first idea is to create reward functions with algorithm mainly neural networks, q tables and reward automaton. And the second approach is to handle reward design as a collaborative social activity. This is realized with examples from previous projects and a code repository which holds concrete reward functions. The deepracer project from Amazon goes into this direction.
The interesting point around reward functions is, that it answers the question how to control a robot. Robot control is nothing else than navigating the robot on the reward map. The reward map is an artificial created mathematical model. The advantage is, that the details for games likes tetris, car driving or biped robot simulators can be ignored. That means, the robot doesn't know which game he is playing because the robot sees only the reward map.
No comments:
Post a Comment