April 03, 2022

Data driven model for robotics control

 

Abstract

Mocap data are stored in a pose database. The human operator retrieves the database and the found keyframe is shown on the screen.

TOC

6a Sprite sheet
▸ 6a1 From a model to a robot
▸ 6a2 Abstraction mechanism
▸ 6a3 Rapid prototyping
  ▸ 6a3a Problem grounding
6b data driven models
        Animation sequences
▸ 6b1 Data vs programming
▸ 6b2 examples for data models
▸ 6b3 data driven task models
6c more modeling tools
▸ 6c1 Model based tracking
  ▸ 6c1a Interactive animation
  ▸ 6c1b Motion retrival
▸ 6c2 Activity recognition but why?
  ▸ 6c2a Game log recording
6d Bibliography

6a Sprite sheet

The paradox situation is, that robotics control is perceived as a complicated task, because no problem at all is available. This understanding contradicts itself: if no problem is there, why should it be hard to solve this problem? The answer is, that computers can solve only well defined tasks. For example, a computer can determine what “78.0/3.14" is. The found solution is very precise and a computer can do so much faster than any human.
The only real obstacle for computer control is, if the computer should execute a task like “move the robot out of the maze”. Such problem is defined on a high level and for executing the task the computer needs to know background knowledge about how to parse english sentence, about spatial navigation and about robotics control. It is hard or even impossible to program computers in such a way, and because of this reason, robotics remains an unsolved problem.
The good news is, that we can formulate a thesis why exactly computers have failed in solving tasks. The reason is because of a missing model. A model abstracts from a problem and without such an abstraction mechanism the problem remains vague.
Let us go a step backward and ask the simple but provocative question: what is a model? From a programming perspective a model is a simulation realized in an object oriented programming language. It allows to predict future situations and helps to understand a domain. So a model is the same as well programmed source code in the python programming language which contains of differential equations, a list of possible actions and a cost function.
Unfortunately, it is hard to program such a model for a concrete task. In case of a simple line following robot it is maybe within the reach to create a model aka a simulator in python but for more complex domains like human animation such models will become very complicated.
An alternative approach in model design works with the data driven paradigm. The idea is to create first a database with information about the domain and then in the second step such a database is converted into an executable model which can predict something. For reason of convenience we can focus on how to create a data-driven model.
The perhaps most obvious technique in doing is motion capture with markers. The raw data are stored into a SQL database and can be analyzed later. A less common but also very powerful technique is to use a sprite sheet as input data. A sprite sheet is an often used tool within game programming. It contains according to the name of a sheet which holds pictures. So it is basically a .PNG file. Such a file can't executed on a computer but it is a data format.
Sprite sheets are able to model an entire domain. They hold information how a walk animation will look like and which other sort of actions are possible, for example, idle and jump. From an animation perspective, a sprite sheet has much in common with a body pose taxonomy. A taxonomy is some sort of table which holds also information about keyframes.
The interesting situation is, that the sprite sheet doesn't solve an existing problem but it creates a new one. If the sprite sheet is there the programmer has to implement a slicing algorithm. Slicing means to extract single images and paint them on the screen.
The main reason why sprite sheets are used frequently by game programmers is because they are helping to understand a domain. According to the sprite sheet the world aka the game looks in a certain way. There are some actions, and some keyframes and this information is everything what is important for a domain. A sprite sheet can be used to model very different problems like sport games, jump'n'run games and even racing games.
see also: 6c more modeling tools

6a1 From a model to a robot

Sprite sheets are used by game programmers since decades. Even the old NES games were using this technique. It remains an open problem how to use this information to create an Artificial Intelligence.
The answer to the issue is, that a sprite sheet is a certain form of a model and if a model is available a computer can utilize this model for solving a problem. Let me give an example. Suppose there is a graph and the computer has to find the shortest path in the graph. Such a problem can be solved easily with an algorithm. There are hundreds of tutorials available which are explaining the details.
The only problem within Artificial Intelligence is a situation which doesn't contain of a graph nor a model. Then it is impossible to solve the domain. All what a programmer has to ensure is, that at least a data driven model is there. In the easiest case this is equal to a sprite sheet in a more advanced setting an additional motion graph is available.
So the question is not how to solve a certain problem, but the question is how to invent a problem. If a problem is there then an algorithm can find the optimal actions within the problem.
Let us go a step backward to understand the situation. A sprite sheet itself won't help to implement an Artificial Intelligence. The reason is, that both things are opposite things. The common understanding is, that a robot aka an artificial agent is able to do something meaningful, for example to find a path in the maze, while sprite sheets are used to create such games. The source of confusion is the algorithm perspective. Computer programmers and AI engineers are usually searching for algorithm. Because an algorithm allows to solve problem. But, AI problems are different from this approach. AI is about defining the problem first, this is called modelling or grounding. Only if the problem is grounded it can be solved.
So the idea is to use sprite sheets, motion graphs, rules and cost functions for inventing a game. And only if the game is there a computer program can solve it. This results into Artificial Intelligence. Perhaps an example make the situation easier to grasp:
Suppose there is a simple sprite sheet which contains of two actions: walk and jump. Also a level map is provided. In addition the rule is, that the robot in the game has to reach the exit of the level. All these ingredients can be combined into a model which is realized with a physics engine. If the game is available it can be solved with model predictive control or reinforcement learning. That means, the computer takes the physics engine as a simulator and tries to figure out the optimal trajectory within this simulator.
 

6a2 Abstraction mechanism

A common understanding is, that a sprite sheet is a simple .png file which is used to animate characters in a game. This description is correct from a technical perspective, but the inner working goes beyond this understanding. From a general perspective, sprite sheets are used as an abstraction mechanism. The idea is to convert a problem into a model.
The main advantage of such a sheet of sprites is, that each sprite gets a unique ID. The sprite with id #1 stands for a certain picture, while the sprite with id #4 stands for a different one. Instead of describing a picture by it's details which means with the colors, the paintings and the size, the programmer can reference only to the number. That means, there are nodes from id #1 to #20 and each number represents a small graphic.
Such abstract description allows to program very advanced games. For example, the id can be used in an animation routine, or it can be used to react to keyboard input actions. For example, if the user is pressing the left key, a certain sprite id is shown on the screen.
 

6a3 Rapid prototyping

The reason why sprite sheets, level maps and motion graphs are used frequently in game design is because they are reduce the time until the game is created. If someone takes an existing sprite sheet which contains of 200 keyframes he won't need to paint the animation manual. And if some one is using a map created by someone else in the past, there is need to reinvent the property from scratch.
If in addition advanced GUI tools and game engines are used it is possible to create an entire game only by drag and drop the resources into the main window. The precondition which allows this highly efficient workflow is to divide complex games into sub elements which are the mentioned sprite sheet, level map, game engine and so on.
Game development doesn't mean to program something for example in the assembly or C language but game development is about aggregate existing content into something which is new.

6a3a Problem grounding

Within the robotics and AI community there is some sort of unsolved mystery. It is about possible algorithms or libraries how to solve AI problems. What computer programmers are doing frequently is to ask for an algorithm. Because an algorithm is a tool to solve a problem. Therefor it sounds logical to ask which sort of software or algorithm will provide artificial intelligence. The sad answer is that no software and no algorithm is available which fits into this category.
But there must be a way available how to create robots otherwise the subject AI at all can be dismissed. The answer to this complicated problem is not to search for algorithm but to describe the process from a design perspective. Design something means, that ideas from the outer perspective are converted into software. This is sometimes called the grounding problem.
To understand who design works different from algorithmic thinking we have to observe how new games are created. What the average game designer is doing is to start a level editor which is of course tile based and in the level editor he drags and drops icons into the map. After a while the user will save the result and this forms the basis of a new game.
The interesting situation is, that the process of creating a new map isn't based on an algorithm, but it is a creative process. What the game designer is doing is use a tool for rapid prototyping a game. That means, level design has nothing to do with game programming in a sense that someone opens an IDE and then programs the game in C++ against a game library, but game design has to do with converting external knowledge into the game. A source of inspiration for a new level might be a good map in another game, or older projects from the game designer. That means, he comes with his personal knowledge to the level editor and creates the map for a certain objective.[Engstrom2018] 6d Bibliography
Formalizing such process is not possible. The reason is, that game design doesn't solve an optimization problem and it has nothing to do with programming at all. Even the beginner friendly python language is offtopic for level design. Instead the computer is used only as a tool similar to a pen to write something down.
The chance is high, that programming robotics works with the same paradigm. It is not about writing source code in python nor C++ but it has to do with level design and GUI interfaces. This makes it hard to use certain programming language or algorithms, because in the design process such tools are not useful at all.
What makes sense instead are rapid prototyping tool. These programs are supporting the design process much better.



6b data driven models

A model is an abstract description of a problem. There are two ways for creating models:
1. Data driven models
2. source code driven models
Programming a model in source code is the goal in robotics development. It is equal to create a forward model which has much in common with a physics engine. Such a model is realized in Python or in C/C++ and can be used by a solver to determine the optimal actions.
The disadvantage of source code driven models is, that somebody has to program it. It is known that writing source code is a complicated task. Especially if the code contains of differential equations to simulate physical systems it is very demanding to write realistic models.
The alternative is to focus on data driven models. According to the name, data driven means the opposite of programming source code but it is equal to paint a picture, write a text file or create a database. Such models can be created much easier. In the easiest case it is realized by recording a game log into a csv file and then the CSV file is treated as a model for the game.
The disadvantage of data driven models is, that they can't be executed directly. For example a gamelog in a CSV database can't predict future states by it's own. An additional algorithm is needed which utilizes the data to a concrete simulation. But this is not a real disadvantage, because data driven models can be seen as a first step in a modeling workflow.
Animation sequences
Let us try to explain how to abstraction works for creating a longer animation sequence. Without any abstraction, the user has to create the animation manual. That means, he has to figure out the keyframes and paint them to the screen. This is a very complicated task which can take hours.
In contrast, an existing data driven model makes it much easier to animate a character. What the user has to do here is to provide a sequence of sprite IDs, for example [4,5,6,7] and then press the run button. The software will search in the database for the images and creates the animation on the screen. This simplifies the creating of longer animation sequences drastically.
With such understanding in mind, animation has no longer to do with painting graphics or figure out the correct pose but it has to do with entering a sequence in the format [a,b,c,d] and then the software model is doing everything else. This powerful principle is called an abstraction because it transforms a problem into an easy interaction.
see also: 6c1 Model based tracking 6c2a Game log recording


6b1 Data vs programming

The natural way in interacting with a computer is to program the machine. Many programming languages like Python, Java or C/C++ are available. And on the first look a programming language is the right tool for creating a model.
System identification means usually to convert a problem into a simulation. And a simulation is always a computer software which calculates something. The problem is, that programming a simulation is very complicated, even object oriented software engineering has increased productivity that much.
The alternative over object oriented programming is a data driven approach.The idea is to model a domain in a database which can be realized as a nosql database, an xml file or a json taxonomy.6d Bibliography [Kopp2018] [Baak2013] What all these dataformats have in common is, that they can't executed because they are not represented, the advantage is, that such files can be created much faster than normal computer code.
According to different studies, a highly skilled programmer can write down only 10 lines of code new codelines per day. This makes it unlikely that a single person is able to create a complex model in such a way. But, a single person can create a database of textual information much faster. Especially if the raw data are provided by a motion capture system which is basically a data logger. So the simple idea is to see modeling only under the perspective of a database and ignore that a model can be realized as a computer program as well.
Let us try to describe how data driven models are created. The main idea is, that there is empty json / xml / nosql database which is populated with textual information. Such information consists of tables, plain text and numerical data. The idea is, that all the information combined is equal to the problem's model. For example, there is a 50 kb large json file on the harddrive and this json file holds the model for a walking robot in a maze. The json file contains of the level map, the body of the robot and a list of possible events. Everything is stored in the ASCII format which means it can be shown with a text editor on the screen. The model doesn't holds executable programs written in lisp, python or java.

6b2 examples for data models

From a technical perspective, a data driven models is stored in a database  Possible file formats are json, xml or plain text.6b1 Data vs programming The problem with this understanding is, that such file formats are trivial. Creating ansi, unicode or even JSON files on a hard drive is nothing which can be improved that much.
The more interesting question is which content is provided. There are some examples available for a data driven model:
• level map, created with a level editor
• body pose taxonomy created with an XML editor
• sprite sheet, created with a graphics program 6a Sprite sheet
• game log stored in a CSV file which captures the keyframes 6c2a Game log recording
• motion graph 6d Bibliography [Kovar2008]


6b3 data driven task models

A task model is by definition a high level abstraction mechanism. The idea is to hide the detail and focus on long term planning horizon. Creating a task planner is from a technical perspective easy because similar to most planners it has to do with searching for a goal in the state space. There is a model and a number of actions and the planner has to find the shortest path in the game.
The bottleneck is, that for most robotics problem no task model is available, therefor it doesn't make sense to plan something. A method for creating models from scratch was provided in the section 6b2 examples for data models So it is likely, that task models can be created in the same way.
A good method for creating such models is a task taxonomy. This is a hierarchical dictionary for all the important words from a problem. For example a task taxonomy for a household robot would contain places like “kitchen, bathroom, floor” and provides actions like “goto, pickup, place”. From a programming perspective such a data model is stored in a json file as a plain text file. Such a model can't be executed and it can't be utilized by a task palnner directly, but it is a good starting point in creating a task simulation.
Let us try to investigate how to convert a task taxonomy into a task simulator. In contrast to a data driven model, a simulator can be executed. Such a system works like any other computer program. And the open question is how t convert data into a program?
The easiest way in creating a task simulator is by manual programming. That means, a programmer takes the task model as prototype model and creates around the data structure executable source code. For example the programmer defines, that after executing the action “goto” the robot position has changed to the new position. Such mapping can be realized in a language like python easily. What is needed is of course a variable for storing the current location of the robot and a method which is changing the position.

6c more modeling tools

Apart from sprite sheets 6a Sprite sheet there are many other possible candidates for creating a model for a problem. A lot of robotics literature were published around the topic of learning from demonstration (LfD) while other papers were written about animation languages.
In both cases the idea is to reduce the state space and solve the grounding problem. Animation languages and LfD as well are seen as abstraction mechanism.
The problem is, that it is hard to realize such principles in reality. For example the LfD idea looks great for the untrained ear. The idea is to record a motion trajectory and then use the recording to determine the parameters for dynamic movement primitives.6d Bibliography Kirk2016 Zhu2018 But it remains unclear who to do so exactly.
The same problem is available for dedicated animation languages.Webber1990 An animation language is a great tool if it was created already but generating a new language from scratch is a complicated task. Existing tools for modeling domain specific language are available, but they are not working well enough for practical applications.
From all the existing tools, a vanilla sprite sheet or a data driven model in general works best. The idea is, that in the first step the model is equal to a database which holds information about the problem. More advenced elements of a model like parametric movement primitives, an animation language or a prediction model are created on top of a database.
With this strict definition, a data driven model can become a .PNG file for storing a sprite sheet, a json file which stands for a database or maybe a CSV file for storing game log information. What a data driven model is not is a neural network, nor python source code and it is not a mathematical equations.
These advanced abstraction tools are created in a later step. That means there are simple data only models and more advanced source code oriented models available.6b data driven models




6c1 Model based tracking

Most existing robot projects are trying to create so called robot control systems. The idea is, that the software generates the signal for a robot arm and then the arm is doing something useful. The opposite idea over producing action is to perceive actions. The following section analyzes the tracking of activities in detail.
Activity tracking assumes, that meaningful actions are available already. They are created mostly by humans. And what the computer has to do is to recognize these movements in space. One example is hand gesture recognition.[De Smedt2017]
The most advanced form of action recognition works with a model in the loop. A model is used to interpret features. Let me give an example. Suppose there is a robot arm which contains of 4 elements which are connected together with joints. So what an intelligent vision system has to do is to take this pre-information as a template to interpret the movements much better. The structure of the robot (4 elements) is the model and the movements are parsed with this knowledge.[Filippi2007]
Another example would be a pose recognition system. The idea is, that a human body can hold different poses like walking, standing, sitting and what a computer has to do is to determine the correct pose ID. For example the human is doing something, and the computer prints on the screen that the human is in pose #3.
All these perception technique have in common that an underlying model is used to interpret the video signals. This model is able to annotate raw data with a meaning. In most cases the meaning is stored in natural language. That means, pose #3 is labeled with a textual string.
In most cases the tracking is realized with motion capture. And the models are based on data.6b data driven models An existing model is used to track movements. For example a body pose model is able to track the pose while a task model can track only high level tasks. From the perspective of a model the world looks a certain way. That means the model defines which parts of the reality are important and then the model tries to match this bias with the raw data from the video signal.




6c1a Interactive animation

Apart from model based tracking there is another strategy available how to create model based robotics. The idea is not to program the robot itself but to design a human computer interface. Basically spoken the human operator clicks somewhere on the screen and this will animate the robot on the screen.
The concept has much in teleoperation. A model is used to increase the automation level. But let us start the subject from the beginnings. Suppose the idea is to teleoperate a pick&place robot. For doing so the jaystick is mapped to the servo motors of the robot. Perhaps an additional GUI will allow to select the concrete servo motor on the screen. Such an interface works reasonable well but it will take many seconds until the robot arm can grasp objects.
The more advanced interaction technique is to define a handful of keyframe and use the mouse to browse through the keyframes. For example, keyframe #1 stands for ungrasp, while keyframe #2 for grasp. So the human operator doesn't controls the servo motors directly but he decides which system state he prefers in a certain moment. 6d Bibliography [Geijtenbeek2012]
The requirement to realize such advanced control system is a model. In the model the keyframes aka body poses are stored so the human operator can select one of the predefined IDs. The interaction with such a system is trivial, because the human operator can decide on a higher task level what action comes next.

6c1b Motion retrieval

The concept of a sprite sheet was discussed already.6a Sprite sheet The principle is used by game designer not from the beginning but frequently to simply ingame sprite animation. The idea is to a PNG file with the walk cycle and use this memory map in the game to animate the main character.
The interesting situation is, that spite sheets can be improved. The resulting motion retrieval system is used to query a mocap database. 6d Bibliography [Sakamoto2004] Similar to a sprite sheet there are some poses stored int he database, and then the software searches for the next pose in the database. This allows to create realistic motions with only a little amount cpu load.
The concept has much in common with model based tracking. The idea is that the underlying mocap database is the model and only body poses from this database can be drawn to the screen. The reason why this technique is highly efficient is because every pose has a unique number. Instead of adjusting all the 17 dof joints of a skeleton the  algorithm needs only to know a single reference number. For example, the sequence [4,1,19,2] is used to create a longer motion with the help of the underlying data model.

6c2 Activity recognition but why?

In the section 6c1 Model based tracking the idea of using a model to recognize existing actions was introduced briefly. The question which was left open is the reason why. Suppose it is possible to recognize that a robot in a maze has hit the wall, has this fact a value?
The reason why model based activity understanding is crucial in robotics is because it helps to create an abstract mechanism.6a2 Abstraction mechanism The idea is that there is a layer between a problem and the computer. The layer itself which is the model has no importance but it can be utilized for many purposes.
So the underlying problem can be summarized as mapping or grounding and has to do with reducing the state space of a problem. In the mentioned example with a robot which hits a wall the original state space has to do with objects which can do do something. There is a color, lots of pixels and endless amount of events. Such a state space is far to complex to understand by a computer and the only way for an Artificial Intelligence is to translate the problem space first into a model.
The interaction with a simplified model is much easier for a computer and can be realized in existing paradigm. For example with a programming language. It is possible to write down into a python program a statement like “if robot hits the wall, then stop motor”. But the precondition is, that at first an abstraction mechanism is available which allows to reduce the state space to a small list of possible events.

6c2a Game log recording

Before it is possible to program a certain sort of software a programmer has to defined first the objective. Suppose the idea is to program an activity recognition engine, where to start?
From a general perspective it can be realized by recording a game log. What all the game logs have in common is, that the keyframes are stored with a timestamp in a directory. for example:

The filename (0-5) represents the timestamp and the file itself holds the screenshot at this moment. So the game log is basically a frame accurate video of the game. Suppose a larger amount of such data were created, in the next step the goal is to parse the information.
A keyframe contains of possible actions, events and states. Each event has a unique number. For example, event #2 means “robot collides with the wall in the maze”. The compilation of all possible events, actions and states are stored in a taxonomy as a 6b data driven models. That means, there is somewhere a hierarchical table which holds all possible events.6d Bibliography [Karpov2013]
The game log parser has the obligation to math the taxonomy with the recorded gamelog. The result produces sense. That means, the keyframes are grounded and annotated. There are many ways for doing the matching. For example neural networks, but in the easiest case a manual created python script can do the task manually. That means, the classification module isn't trained by a learning algorithm, but it is hand-crafted.

6d Bibliography

Baak, Andreas, et al. "A data-driven approach for real-time full body pose reconstruction from a depth camera." Consumer depth cameras for computer vision. Springer, London, 2013. 71-98.
De Smedt, Quentin, et al. "Shrec'17 track: 3d hand gesture recognition using a depth and skeletal dataset." 3DOR-10th Eurographics Workshop on 3D Object Retrieval. 2017.
Engstrom, Henrik, Jenny Brusk, and Patrik Erlandsson. "Prototyping tools for game writers." The Computer Games Journal 7.3 (2018): 153-172.
Filippi, Hannes. "Wireless teleoperation of robotic arms." (2007).
Geijtenbeek, Thomas, and Nicolas Pronost. "Interactive character animation using simulated physics: A state‐of‐the‐art review." Computer graphics forum. Vol. 31. No. 8. Oxford, UK: Blackwell Publishing Ltd, 2012.
Karpov, Igor V., Jacob Schrum, and Risto Miikkulainen. "Believable bot navigation via playback of human traces." Believable bots. Springer, Berlin, Heidelberg, 2013. 151-170.
Kirk, James, Aaron Mininger, and John Laird. "Learning task goals interactively with visual demonstrations." Biologically Inspired Cognitive Architectures 18 (2016): 1-8.
Kopp, Oliver, Anita Armbruster, and Olaf Zimmermann. "Markdown Architectural Decision Records: Format and Tool Support." ZEUS. 2018.
Kovar, Lucas, Michael Gleicher, and Frédéric Pighin. "Motion graphs." ACM SIGGRAPH 2008 classes. 2008. 1-10.
Sakamoto, Yasuhiko, Shigeru Kuriyama, and Toyohisa Kaneko. "Motion map: image-based retrieval and segmentation of motion data." Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation. 2004.
Webber, Bonnie, and Barbara Di Eugenio. "Free adjuncts in natural language instructions." COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics. 1990.
Zhu, Zuyuan, and Huosheng Hu. "Robot learning from demonstration in robotic assembly: A survey." Robotics 7.2 (2018): 17.