A modern robot control system isn't oriented on mathematical understanding of the world but it is located in the arts. The purpose is to create an animation [4b5] so a key-frame generator is the core of the system.[Mi2021] [Th1989].
A common issue is how to generate the animation for a variety of domains. The needed motion for a self driving car looks different from the motion of a pick&place robot. The answer is to use rapid prototyping GUI tools.[4b4b] The idea is that the user clicks in a GUI on buttons and in under 5 minutes the animation model is created from scratch.[Do2003] [Ch2012].
To plan inside a model a cost function specifies the transitions between the keyframes. [4b2a] Also it takes care of obstacles in the environment. Cost constraints can be specified in natural language commands.[Ko2011] [Ya2021] [Ko2014].
4c1 Robotics automation 1x1
Understanding robotics means at foremost to know which sort of problems can be solved easily with a computer and which one are harder to realize. This understanding allows to define the priorities and discuss open problems in depth.
At first we have to know two easy to automate tasks. The one is a teleoperated robot and the second is a model based planner. Creating a teleoperated robot means to provide hard- and software which controls the robot with a joystick. Such a pipeline is documented very well all over the internet. The second technique (model based planning) is less common but can be realized also easily.
Model based planning means, that for a certain domain for example, the 15 puzzle a forward model is avialable which includes the cost function. And the planer determines how to solve the puzzle. The algorithm is mostly realized in C/C++ and will find the optimal trajectory very fast.
Even if both techniques are working quite well something is missing in the domain of robotics. And because of this extra part, it remains a challenge to realize a robot. What is not available is a more general approach in which a robot is able to solve a certain new problem which has no model yet.
For example, there is a robot arm and the robot arm should grasp a bottle. Such a problem is difficult to realize. Because it has nothing to do with teleoperation and it has also nothing to do with model based planning. Or to be more specific, the robot should work autonomously, and no model is available for the grasping problem.
Because of this combination it is difficult or even impossible to write an algorithm for this issue. So we have found a hard to solve problem which stays outside of today's capabilities.
After this introduction it is clear what the challenge in modern robotics is, and now we can try to solve the issue. The reason why the grasping problem is difficult is because of the contradiction between what today's AI can solve and what the grasping challenge is about. Today's engineers can create teleopertaed robots for unknown domains and they can write solvers for well defined models. But, the grasping challenge is located within autonomous robots and has no model at all.
Let us go a step backward to understand the situation in depth. Creating a model for a task is called system identification. A more programming oriented term for the same issue is game design. Game design means to create a model. A model is some sort of simulation. So what is the model for grasping a bottle? Right, nobody knows. Even more advanced literature has no answer to this problem. It is located in advanced robotics and wasn't researched yet.
What is available are some techniques which are going into the direction of system identification. One powerful technique is motion capture. Motion capture means to record an animation. This is usually done with markers which have a position in the 3d space. The location is recorded over the time. In theory such a log file can be converted into a model later.
The most recent approach in motion capture is a model based motion capture tracking. Explaining this technique in detail is very complicated. It tries to solve the hen&egg problem. Even if no model is available, the assumption is, that there is a model available and this model allows to parse the motion capture data.
Let us describe the paradox situation briefly. The hope is to convert motion capture data into a model. For doing so we need at first a model. Then we can track the information. But we have no model yet, so we can't parse the information, right?
IT was mentioned before that the issue is highly complicated. One possible idea is to take a look at some existing technique which may help to understand the situation. Suppose the idea is to create an animated film on the computer. Then the open source software synfig may help to create such animation. Synfig allows to create a stop motion animation. The user defines some keyframes, and the software renders the resulting movie.
The synfig software was never developed for the purpose of robotics grasping but it might be utilized for this purpose. A synfig file is basically a model. The model contains of keyframes which are located on a time axis. Question: is it possible to create a short grasping animation in synfig? Oh yes, this is an easy task. If such animation was created, the keyframes can be converted into a model. And this model may help to parse a motion capture demonstration.
The screenshot shows a simple model for a grasping task. It contains of 4 picture which are provided as index cards. Such a model can be converted into a computer program and then it helps to parse a motion capture demonstration. THen, a more elaborated model can be created.
4c1a Why modeling is important
Computer engineers have developed a variety of tools for solving all sort of problems. Neural networks, database and complied programming languages are able to create software in general which can be run efficient on recent multi core processors. In addition, lots of input and output devices for example, sensors, high resolution monitors and actuators are available. From a pure computational perspective, lots of progress was made in the last decades.
The interesting situation is, that all these gadgets are useless or have only a minor priority for solving robotics problems. Basically spoken, it is possible to use the mentioned tools correctly but the robot won't do anything useful. This mismatch is caused by the grounding problem. If the domain wasn't grounded yet, there is a gap between the available computing tools and the robotics task.
The grounding problem can be solved easily by modeling. Modeling means to create a simulation for a certain problem. In most cases this is equal to a short term motion model and a long term task model. If these models were created, the models can be run on existing computer hardware and they can be combined with databases and input devices.
Because the topic of modeling is important for robotics let us describe the issue in detail. From a technical perspective a model can be realized in software. It is source code which solves a certain issue. Or to be more specific the code formulates the challenge in a machine readable way. The most basic form of a model is constraints. For example the equation “x+2<5” is such a constraint. It formalizes a mathematical problem and then it is possible to solve the quest.
The interesting situation is, that solving the equation is easy because a short look will show that that x is smaller 3”. This can be determined by a human mathematician or a computer program as well. The more demanding task is to define the constraints first.. That means, someone has to invent a mathematical problem and solving it is the easier part.
In contrast, computer programming works the other way around. The self understanding of computer engineers is to find algorithm which solves a problem. Lots of books and libraries were created for implementing all sort of algorithms. The problem is, that if no problem is there, any algorithm is useless. And because of this situation, robotics is a complicated domain.
Let us take a look back how robotics has evolved over the decades. The most important breakthrough in robotics was not the invention of neural networks and it wasn't the advent of the RRT Sampling based algorithm. But the revolutionary milestone was the invention of the micromouse challenge. The micromouse challenge specifies a robot problem which has to be solved by different participants. It explains to the audience that there is a maze and a robot and that the robot should do something which is measured in seconds. So we can say, that the micromouse competition is some sort of model or a constraints, and then the challenge has to be solved. Solving the challenge which includes to build the hardware and write the software for the robot is a trivial task. Anybody can do so and lots of examples form the past are available. The more hard to grasp problem is to invent competitions similar to micro mouse, this is the real grounding problem.
The task of creating a model for a domain is a bit esoteric but it is not completely unknown in traditional mathematics. There is a large amount of tutorials available for the “game design” problem and also for the problem of converting a textual problem into a mathematical formula.[Br1983]
Word problems are usually treated with
https://en.wikipedia.org/wiki/Mathematical_model A mathematical model isn't answering the problem itself but it converts natural language into a mathematical equation. And then the problem is solved by mathematical tools.
This principle can be utilized for the symbol grounding problem too. Or at least it is related to it.
The interesting situation is, that there is no single strategy for creating a mathematical model. The reason is, that grounding is located outside of mathematics. What might help to understand the situation is a missing model. Suppose a robot should do something but there is no model available for the domain. That means, it wasn't specified yet what the goal is, what the subgoals are and which possible states are available in the robot game. How can the robot be programmed to solve the task? Right, there is no way in doing so. Without a precise model, the optimal trajectory can't be found. That means, the automation problem will fail for sure.
According to the more recent robotics literature from the last 20 years there are some strategies available which are solving partly the grounding problem, namely motion capture, teleoperation and manual game design. All these strategies can produce a model. They are not located in mathematical nor computer science itself, but they have more in common with a workflow which is executed by humans. Not the robot has to do something, but the human has to run a motion capture demonstration and manual convert the raw data into a model.
4c1b Well defined problems
The main reason why robotics is hard to realize is because of the problem aren't specified well enough. Let me give an example. Suppose there is a graph available which contains of costs and some nodes. The goal is to find a path in the graph from the current node too the goal node by minimizing the costs. Such a task is well defined and can be solved easily with a computer.
The problem can be formulated in matlab or in python and if the programmer struggles in some detail issue he will find for sure help in online forums. Unfortunately, most robotics problems are not from this category. But a typical robot problem looks the following way:
1. there is no graph and no other datastructure available
2. what is available is a robot and a task given in natural language
3. after pressing the start button the robot should clean up the kitchen or drive on a road in real traffic
Solving such a problem with a computer or even formulate the solving algorithm is not possible. In contrast to the previously mentioned problem, its located outside of well defined defined problems. It is an AI problem not specified well enough. If the programmer tries to solve it or even asks in an online forum for help he won't be successful. That means, other programmers doesn't know how to solve it as well.
The reason why it is a bit complicated to explain. It is not about solving an existing mathematical problem, but the problem is, that the problem wasn't formulated precisely. A precise defined problem is usually equal to a model. If a model is available which includes constraints, then the problem can be solved with mathematical tools. And in addition it is possible to ask other programmer how to implement the solver in detail.
What is needed for most robotics is to define the problem as a model. A model is always an invented situation. It is not available as default but modeling is located on a higher layer.
4c1c From a problem to the model
The main reason why it is hard to program robots is because of missing understanding what the objective is. Today's engineers have access to the latest hard- and software which includes modern operating systems, programming languages and endless amount of disc storage but they have no idea how to apply this technology to robotics problems.
To describe the AI problem in detail we have to take a more abstract look towards the problem of robotics. What all robotics problems have in common is that a domain is mapped to a model. The domain can be a self driving car or a soccer playing robot. The interesting situation is, that a domain can't be programmed or solved directly. The missing connection between the domain and the model is called the grounding problem or the abstraction mechanism. If this link is not there or broken, than computer engineers are struggling to program a robot, which is the default situation in the now.
In contrast, a model for a domain can be treated easily in software. Models have the advantage that they can be simulated in software, future states can be predicted and very important they can be solved. A good and easy to example for a model is the 15 puzzle. The model for this problem fits in under 30 lines of code. It contains of possible movements in the puzzle and the objective what the goal is.
In general models are located within computer science, while rough domains are located out side of computing. So the question is how to map a domain to a model? Nobody knows, because the grounding problem remains unsolved.
But even it is impossible to create models, the previously description gives an idea why robotics in the past have failed. The explanation why a certain robots collides with the wall in a maze is because something with the model is wrong. From a support perspective it is possible to advice any robotics engineer to improve it's model.
Most techniques in robotics have to do with modeling. Mostly it is not described this way. For example the engineers are talking about motion capture, neural networks or natural language. But what the debate is really about is how to map a domain to a model. This debate is highly complex and it is much harder than solving ordinary puzzle. A normal puzzle has the advantage that the model was defined already. For example in a chess puzzle there is a board which contains of 8x8 squares and then a certain task has to be fulfilled.
In contrast, in model building and game design no such puzzle is available. That means the problem is located on a higher abstraction level which remains invisible for the researcher.
[Erkut2000] [Gervautz1994]
4c1d Abstraction mechanism 1x1
The main reason why it is hard to program robots is because of missing words for describing the issue. The existing computing terms which were created with the advents of microelectronics doesn't fit to the AI Domain. Terms like operating systems, transistors, 8bit CPU and even theoretical terms like hash table, sorting and search algorithm doesn't fit well to the needs of Robotics programming. Without a language for describing AI it is not possible to figure out possible answers, and this remains AI remains unexplored.
So let us go a step backward and ask the bold question: about what we are talking about? Robotics programming is mostly about finding abstraction mechanism. Abstraction mechanism means to convert problems into models. If the model is available, then it is possible to discuss the model with computational terms which have to do with programming, fast compilers and algorithms. The problem is, that what is called an abstraction mechanism is a very vague description of what AI is about. It is some sort of mapping and the amount of literature about the subject is low.
So basically spoken the topic wasn't invented yet and because of this reason the terms are not available. Let us search for an academic subject which comes close to the abstraction idea. Under the term “Model predictive control” lots of books are available. MPC has to do with control theory and the idea is to use a model in the loop to improve the control of a system. What remains open is how to create such a model. Creating a model has much in common with game design, that means it is not available in nature, but an artist has to create it from scratch.
What Model predictive control is discussing in depth is how to control a system if the model is already there, without a model no MPC controller can be created. Because of this limitation, MPC is not the core of an abstraction mechanism but it is something which is understood already.
Another existing topic which comes close to the abstraction is a differential equation used in mathematics. Differential equations are used by physics student to describe real world systems. With such a quation it is possible to model the dynamics of a car or describe the weather. A differential equation and a model is the same. But modeling has more options than using only mathematical description of the world.
4c2 Mapping motion capture to a pose taxonomy
A motion capture recording produces numerical information of the marker. The position of the marker is tracked in realtime and can be stored into a machine readable CSV file. Even if the information is accurate on a millimeter level something important is missing. There is meaning available about the poses and the trajectory remains ungrounded.
To overcome the obstacle the mocap data has to be matched with a model. The model is mostly a pose taxonomy. In the pose taxonomy some keyframes are stored for example 24 of them. With this model in the background each mocap picture can be assigned to one pose in the taxonomy.
This allows to convert the raw data of the marker into meaningful grounded information. The capture data creates a pose sequence over the time axis for example:
[pose #2 “standing”, pose #18 “sitting”, pose #15 “sitting”]
Instead of analyzing a state space which can has million of possible poses, the state space is reduced to 24 possible posses which are given by a template database. In addition the taxonomy is of course structured in a hierarchical fashion which allows to reduce the categories further. At each situation the mocap data can be labeled with one of the four main categories and then subcategories can be determined.
see also:
- [4b3] The grounding problem in detail
- [4b6] Shadowing movements of a robot
4c2a From a body pose taxonomy to a keyframe generator
Instead of analyzing what a taxonomy is by it's own understanding let us try to introduce how to use it in a concrete situation. Suppose the user likes to create an animation for a biped walking robot. The goal is to create a sequence of 4 single pictures. Such a task can be realized with a body pose taxonomy easily.
What the user has to do is execute a method in the program which is “generationanimation(8,9,4,5)”. The numbers given as parameter are the keyframes from the pose taxonomy. Behind each number there is a concrete pose stored. After executing the method, the program delivers the pictures which are stored as a list of points. These point list is rendered to a graphical representation in a video.
The pose taxonomy is basically a short notation for specifying an animation. The idea is that the user enters only a list of IDs and these IDs are converted automatically into full blown keyframes which contains of many single points. These points are located in the 2d or 3d space and are representing the pose.
So the main task of a taxonomy is to assign a pose number to a concrete pose. In addition, the pose numbers are ordered in a hierarchical table of contents. In advanced taxonomies there is even a transition matrix available, for example the sequence from 8 to 9 is allowed while the sequence from id #8 to #1 is forbidden. If the user likes to render such a sequence he gets an error in return.
4c2b Pose database as an abstraction mechanism
The term abstraction mechanism describes a workflow in which a problem is converted into a model. For example, there is a line following robot in the real world and the goal is to simulate the robot in a computer program.
Creating such models is one of the hardest problems within Artificial INtelligence, and missing models are the main reason why today's robots struggling in solving tasks. In contrast, an existing well grounded model results into a working robot control system, similar to a 15 puzzle solver program which is able to solve the task much faster than any human can do.
So the basic question is how to create a model? A model aka a simulation is equal to a physics engine, this is a part of the software which can predict future system states. Unfortunately, physics engine are hard to program. Most existing physics engines like Box2d are working with differential equations and lots of mathematics in general. A lot of experience is needed to create a forward model for a certain domain.
A possible way to overcome the obstacles is by create first only a data driven model. Data driven means that the model can't be executed because it doesn't contains of source code. IN contrast, the model is equal to a database file. The idea is, that in the second step the database is converted into an executable physics engine.
Creating a database file for a domain is much easier than programming a full blown physics engine in C/C++. A database is mostly stored in a plain text format or as a table. The information is taken from motion capture device and game logger tools. A typical example is a body pose database. The idea is to create a list of 24 predefined poses and store this list in a database file.
Then a ranking algorithm is needed to retrieve the information. For example the user defines a certain pose, and the most 3 similar entries from the database are returned in an ordered fashion. Such a system is a basic example for a data driven model.
The original idea was to create an abstraction mechanism. A database can be seen as a model of the reality. That means, everything what is stored in the database is part of the domain. Instead of analyzing the domain itself, the interaction takes place with the database. That means from a computer perspective the world contains of the database which has stored 24 body poses. The main advantage is, that this world understanding reduces the complexity dramatically. It helps to understand which sort of software has to be programmed next. The software doesn't need to become intelligent in the sense of Artificial INtelligence, but the software is simply a database query algorithm.[Ch2012]