Abstract
TOC
6a Sprite sheet
▸ 6a1 From a model to a robot
▸ 6a2 Abstraction mechanism
▸ 6a3 Rapid prototyping
▸ 6a3a Problem grounding
6b data driven models
Animation sequences
▸ 6b1 Data vs programming
▸ 6b2 examples for data models
▸ 6b3 data driven task models
6c more modeling tools
▸ 6c1 Model based tracking
▸ 6c1a Interactive animation
▸ 6c1b Motion retrival
▸ 6c2 Activity recognition but why?
▸ 6c2a Game log recording
6d Bibliography
6a Sprite sheet
The
paradox situation is, that robotics control is perceived as a
complicated task, because no problem at all is available. This
understanding contradicts itself: if no problem is there, why should it
be hard to solve this problem? The answer is, that computers can solve
only well defined tasks. For example, a computer can determine what
“78.0/3.14" is. The found solution is very precise and a computer can do
so much faster than any human.
The only real obstacle for computer
control is, if the computer should execute a task like “move the robot
out of the maze”. Such problem is defined on a high level and for
executing the task the computer needs to know background knowledge about
how to parse english sentence, about spatial navigation and about
robotics control. It is hard or even impossible to program computers in
such a way, and because of this reason, robotics remains an unsolved
problem.
The good news is, that we can formulate a thesis why exactly
computers have failed in solving tasks. The reason is because of a
missing model. A model abstracts from a problem and without such an
abstraction mechanism the problem remains vague.
Let us go a step
backward and ask the simple but provocative question: what is a model?
From a programming perspective a model is a simulation realized in an
object oriented programming language. It allows to predict future
situations and helps to understand a domain. So a model is the same as
well programmed source code in the python programming language which
contains of differential equations, a list of possible actions and a
cost function.
Unfortunately, it is hard to program such a model for a
concrete task. In case of a simple line following robot it is maybe
within the reach to create a model aka a simulator in python but for
more complex domains like human animation such models will become very
complicated.
An alternative approach in model design works with the
data driven paradigm. The idea is to create first a database with
information about the domain and then in the second step such a database
is converted into an executable model which can predict something. For
reason of convenience we can focus on how to create a data-driven model.
The
perhaps most obvious technique in doing is motion capture with markers.
The raw data are stored into a SQL database and can be analyzed later. A
less common but also very powerful technique is to use a sprite sheet
as input data. A sprite sheet is an often used tool within game
programming. It contains according to the name of a sheet which holds
pictures. So it is basically a .PNG file. Such a file can't executed on a
computer but it is a data format.
Sprite sheets are able to model an
entire domain. They hold information how a walk animation will look
like and which other sort of actions are possible, for example, idle and
jump. From an animation perspective, a sprite sheet has much in common
with a body pose taxonomy. A taxonomy is some sort of table which holds
also information about keyframes.
The interesting situation is, that
the sprite sheet doesn't solve an existing problem but it creates a new
one. If the sprite sheet is there the programmer has to implement a
slicing algorithm. Slicing means to extract single images and paint them
on the screen.
The main reason why sprite sheets are used
frequently by game programmers is because they are helping to understand
a domain. According to the sprite sheet the world aka the game looks in
a certain way. There are some actions, and some keyframes and this
information is everything what is important for a domain. A sprite sheet
can be used to model very different problems like sport games,
jump'n'run games and even racing games.
see also: 6c more modeling tools
6a1 From a model to a robot
Sprite
sheets are used by game programmers since decades. Even the old NES
games were using this technique. It remains an open problem how to use
this information to create an Artificial Intelligence.
The answer to
the issue is, that a sprite sheet is a certain form of a model and if a
model is available a computer can utilize this model for solving a
problem. Let me give an example. Suppose there is a graph and the
computer has to find the shortest path in the graph. Such a problem can
be solved easily with an algorithm. There are hundreds of tutorials
available which are explaining the details.
The only problem within
Artificial Intelligence is a situation which doesn't contain of a graph
nor a model. Then it is impossible to solve the domain. All what a
programmer has to ensure is, that at least a data driven model is there.
In the easiest case this is equal to a sprite sheet in a more advanced
setting an additional motion graph is available.
So the question is
not how to solve a certain problem, but the question is how to invent a
problem. If a problem is there then an algorithm can find the optimal
actions within the problem.
Let us go a step backward to understand
the situation. A sprite sheet itself won't help to implement an
Artificial Intelligence. The reason is, that both things are opposite
things. The common understanding is, that a robot aka an artificial
agent is able to do something meaningful, for example to find a path in
the maze, while sprite sheets are used to create such games. The source
of confusion is the algorithm perspective. Computer programmers and AI
engineers are usually searching for algorithm. Because an algorithm
allows to solve problem. But, AI problems are different from this
approach. AI is about defining the problem first, this is called
modelling or grounding. Only if the problem is grounded it can be
solved.
So the idea is to use sprite sheets, motion graphs, rules and
cost functions for inventing a game. And only if the game is there a
computer program can solve it. This results into Artificial
Intelligence. Perhaps an example make the situation easier to grasp:
Suppose
there is a simple sprite sheet which contains of two actions: walk and
jump. Also a level map is provided. In addition the rule is, that the
robot in the game has to reach the exit of the level. All these
ingredients can be combined into a model which is realized with a
physics engine. If the game is available it can be solved with model
predictive control or reinforcement learning. That means, the computer
takes the physics engine as a simulator and tries to figure out the
optimal trajectory within this simulator.
6a2 Abstraction mechanism
A
common understanding is, that a sprite sheet is a simple .png file
which is used to animate characters in a game. This description is
correct from a technical perspective, but the inner working goes beyond
this understanding. From a general perspective, sprite sheets are used
as an abstraction mechanism. The idea is to convert a problem into a
model.
The main advantage of such a sheet of sprites is, that each
sprite gets a unique ID. The sprite with id #1 stands for a certain
picture, while the sprite with id #4 stands for a different one. Instead
of describing a picture by it's details which means with the colors,
the paintings and the size, the programmer can reference only to the
number. That means, there are nodes from id #1 to #20 and each number
represents a small graphic.
Such abstract description allows to
program very advanced games. For example, the id can be used in an
animation routine, or it can be used to react to keyboard input actions.
For example, if the user is pressing the left key, a certain sprite id
is shown on the screen.
6a3 Rapid prototyping
The
reason why sprite sheets, level maps and motion graphs are used
frequently in game design is because they are reduce the time until the
game is created. If someone takes an existing sprite sheet which
contains of 200 keyframes he won't need to paint the animation manual.
And if some one is using a map created by someone else in the past,
there is need to reinvent the property from scratch.
If in addition
advanced GUI tools and game engines are used it is possible to create an
entire game only by drag and drop the resources into the main window.
The precondition which allows this highly efficient workflow is to
divide complex games into sub elements which are the mentioned sprite
sheet, level map, game engine and so on.
Game development doesn't
mean to program something for example in the assembly or C language but
game development is about aggregate existing content into something
which is new.
6a3a Problem grounding
Within
the robotics and AI community there is some sort of unsolved mystery.
It is about possible algorithms or libraries how to solve AI problems.
What computer programmers are doing frequently is to ask for an
algorithm. Because an algorithm is a tool to solve a problem. Therefor
it sounds logical to ask which sort of software or algorithm will
provide artificial intelligence. The sad answer is that no software and
no algorithm is available which fits into this category.
But there
must be a way available how to create robots otherwise the subject AI at
all can be dismissed. The answer to this complicated problem is not to
search for algorithm but to describe the process from a design
perspective. Design something means, that ideas from the outer
perspective are converted into software. This is sometimes called the
grounding problem.
To understand who design works different from
algorithmic thinking we have to observe how new games are created. What
the average game designer is doing is to start a level editor which is
of course tile based and in the level editor he drags and drops icons
into the map. After a while the user will save the result and this forms
the basis of a new game.
The interesting situation is, that the
process of creating a new map isn't based on an algorithm, but it is a
creative process. What the game designer is doing is use a tool for
rapid prototyping a game. That means, level design has nothing to do
with game programming in a sense that someone opens an IDE and then
programs the game in C++ against a game library, but game design has to
do with converting external knowledge into the game. A source of
inspiration for a new level might be a good map in another game, or
older projects from the game designer. That means, he comes with his
personal knowledge to the level editor and creates the map for a certain
objective.[Engstrom2018] 6d Bibliography
Formalizing such process is
not possible. The reason is, that game design doesn't solve an
optimization problem and it has nothing to do with programming at all.
Even the beginner friendly python language is offtopic for level design.
Instead the computer is used only as a tool similar to a pen to write
something down.
The chance is high, that programming robotics works
with the same paradigm. It is not about writing source code in python
nor C++ but it has to do with level design and GUI interfaces. This
makes it hard to use certain programming language or algorithms, because
in the design process such tools are not useful at all.
What makes sense instead are rapid prototyping tool. These programs are supporting the design process much better.
6b data driven models
A model is an abstract description of a problem. There are two ways for creating models:
1. Data driven models
2. source code driven models
Programming
a model in source code is the goal in robotics development. It is equal
to create a forward model which has much in common with a physics
engine. Such a model is realized in Python or in C/C++ and can be used
by a solver to determine the optimal actions.
The disadvantage of
source code driven models is, that somebody has to program it. It is
known that writing source code is a complicated task. Especially if the
code contains of differential equations to simulate physical systems it
is very demanding to write realistic models.
The alternative is to
focus on data driven models. According to the name, data driven means
the opposite of programming source code but it is equal to paint a
picture, write a text file or create a database. Such models can be
created much easier. In the easiest case it is realized by recording a
game log into a csv file and then the CSV file is treated as a model for
the game.
The disadvantage of data driven models is, that they can't
be executed directly. For example a gamelog in a CSV database can't
predict future states by it's own. An additional algorithm is needed
which utilizes the data to a concrete simulation. But this is not a real
disadvantage, because data driven models can be seen as a first step in
a modeling workflow.
Animation sequences
Let us try to explain
how to abstraction works for creating a longer animation sequence.
Without any abstraction, the user has to create the animation manual.
That means, he has to figure out the keyframes and paint them to the
screen. This is a very complicated task which can take hours.
In
contrast, an existing data driven model makes it much easier to animate a
character. What the user has to do here is to provide a sequence of
sprite IDs, for example [4,5,6,7] and then press the run button. The
software will search in the database for the images and creates the
animation on the screen. This simplifies the creating of longer
animation sequences drastically.
With such understanding in mind,
animation has no longer to do with painting graphics or figure out the
correct pose but it has to do with entering a sequence in the format
[a,b,c,d] and then the software model is doing everything else. This
powerful principle is called an abstraction because it transforms a
problem into an easy interaction.
see also: 6c1 Model based tracking 6c2a Game log recording
6b1 Data vs programming
The
natural way in interacting with a computer is to program the machine.
Many programming languages like Python, Java or C/C++ are available. And
on the first look a programming language is the right tool for creating
a model.
System identification means usually to convert a problem
into a simulation. And a simulation is always a computer software which
calculates something. The problem is, that programming a simulation is
very complicated, even object oriented software engineering has
increased productivity that much.
The alternative over object
oriented programming is a data driven approach.The idea is to model a
domain in a database which can be realized as a nosql database, an xml
file or a json taxonomy.6d Bibliography [Kopp2018] [Baak2013] What all
these dataformats have in common is, that they can't executed because
they are not represented, the advantage is, that such files can be
created much faster than normal computer code.
According to different
studies, a highly skilled programmer can write down only 10 lines of
code new codelines per day. This makes it unlikely that a single person
is able to create a complex model in such a way. But, a single person
can create a database of textual information much faster. Especially if
the raw data are provided by a motion capture system which is basically a
data logger. So the simple idea is to see modeling only under the
perspective of a database and ignore that a model can be realized as a
computer program as well.
Let us try to describe how data driven
models are created. The main idea is, that there is empty json / xml /
nosql database which is populated with textual information. Such
information consists of tables, plain text and numerical data. The idea
is, that all the information combined is equal to the problem's model.
For example, there is a 50 kb large json file on the harddrive and this
json file holds the model for a walking robot in a maze. The json file
contains of the level map, the body of the robot and a list of possible
events. Everything is stored in the ASCII format which means it can be
shown with a text editor on the screen. The model doesn't holds
executable programs written in lisp, python or java.
6b2 examples for data models
From
a technical perspective, a data driven models is stored in a database
Possible file formats are json, xml or plain text.6b1 Data vs
programming The problem with this understanding is, that such file
formats are trivial. Creating ansi, unicode or even JSON files on a hard
drive is nothing which can be improved that much.
The more interesting question is which content is provided. There are some examples available for a data driven model:
• level map, created with a level editor
• body pose taxonomy created with an XML editor
• sprite sheet, created with a graphics program 6a Sprite sheet
• game log stored in a CSV file which captures the keyframes 6c2a Game log recording
• motion graph 6d Bibliography [Kovar2008]
6b3 data driven task models
A
task model is by definition a high level abstraction mechanism. The
idea is to hide the detail and focus on long term planning horizon.
Creating a task planner is from a technical perspective easy because
similar to most planners it has to do with searching for a goal in the
state space. There is a model and a number of actions and the planner
has to find the shortest path in the game.
The bottleneck is, that
for most robotics problem no task model is available, therefor it
doesn't make sense to plan something. A method for creating models from
scratch was provided in the section 6b2 examples for data models So it
is likely, that task models can be created in the same way.
A good
method for creating such models is a task taxonomy. This is a
hierarchical dictionary for all the important words from a problem. For
example a task taxonomy for a household robot would contain places like
“kitchen, bathroom, floor” and provides actions like “goto, pickup,
place”. From a programming perspective such a data model is stored in a
json file as a plain text file. Such a model can't be executed and it
can't be utilized by a task palnner directly, but it is a good starting
point in creating a task simulation.
Let us try to investigate how to
convert a task taxonomy into a task simulator. In contrast to a data
driven model, a simulator can be executed. Such a system works like any
other computer program. And the open question is how t convert data into
a program?
The easiest way in creating a task simulator is by manual
programming. That means, a programmer takes the task model as prototype
model and creates around the data structure executable source code. For
example the programmer defines, that after executing the action “goto”
the robot position has changed to the new position. Such mapping can be
realized in a language like python easily. What is needed is of course a
variable for storing the current location of the robot and a method
which is changing the position.
6c more modeling tools
Apart
from sprite sheets 6a Sprite sheet there are many other possible
candidates for creating a model for a problem. A lot of robotics
literature were published around the topic of learning from
demonstration (LfD) while other papers were written about animation
languages.
In both cases the idea is to reduce the state space and
solve the grounding problem. Animation languages and LfD as well are
seen as abstraction mechanism.
The problem is, that it is hard to
realize such principles in reality. For example the LfD idea looks great
for the untrained ear. The idea is to record a motion trajectory and
then use the recording to determine the parameters for dynamic movement
primitives.6d Bibliography Kirk2016 Zhu2018 But it remains unclear who
to do so exactly.
The same problem is available for dedicated
animation languages.Webber1990 An animation language is a great tool if
it was created already but generating a new language from scratch is a
complicated task. Existing tools for modeling domain specific language
are available, but they are not working well enough for practical
applications.
From all the existing tools, a vanilla sprite sheet or a
data driven model in general works best. The idea is, that in the first
step the model is equal to a database which holds information about the
problem. More advenced elements of a model like parametric movement
primitives, an animation language or a prediction model are created on
top of a database.
With this strict definition, a data driven model
can become a .PNG file for storing a sprite sheet, a json file which
stands for a database or maybe a CSV file for storing game log
information. What a data driven model is not is a neural network, nor
python source code and it is not a mathematical equations.
These
advanced abstraction tools are created in a later step. That means there
are simple data only models and more advanced source code oriented
models available.6b data driven models
6c1 Model based tracking
Most
existing robot projects are trying to create so called robot control
systems. The idea is, that the software generates the signal for a robot
arm and then the arm is doing something useful. The opposite idea over
producing action is to perceive actions. The following section analyzes
the tracking of activities in detail.
Activity tracking assumes, that
meaningful actions are available already. They are created mostly by
humans. And what the computer has to do is to recognize these movements
in space. One example is hand gesture recognition.[De Smedt2017]
The
most advanced form of action recognition works with a model in the
loop. A model is used to interpret features. Let me give an example.
Suppose there is a robot arm which contains of 4 elements which are
connected together with joints. So what an intelligent vision system has
to do is to take this pre-information as a template to interpret the
movements much better. The structure of the robot (4 elements) is the
model and the movements are parsed with this knowledge.[Filippi2007]
Another
example would be a pose recognition system. The idea is, that a human
body can hold different poses like walking, standing, sitting and what a
computer has to do is to determine the correct pose ID. For example the
human is doing something, and the computer prints on the screen that
the human is in pose #3.
All these perception technique have in
common that an underlying model is used to interpret the video signals.
This model is able to annotate raw data with a meaning. In most cases
the meaning is stored in natural language. That means, pose #3 is
labeled with a textual string.
In most cases the tracking is realized
with motion capture. And the models are based on data.6b data driven
models An existing model is used to track movements. For example a body
pose model is able to track the pose while a task model can track only
high level tasks. From the perspective of a model the world looks a
certain way. That means the model defines which parts of the reality are
important and then the model tries to match this bias with the raw data
from the video signal.
6c1a Interactive animation
Apart
from model based tracking there is another strategy available how to
create model based robotics. The idea is not to program the robot itself
but to design a human computer interface. Basically spoken the human
operator clicks somewhere on the screen and this will animate the robot
on the screen.
The concept has much in teleoperation. A model is used
to increase the automation level. But let us start the subject from the
beginnings. Suppose the idea is to teleoperate a pick&place robot.
For doing so the jaystick is mapped to the servo motors of the robot.
Perhaps an additional GUI will allow to select the concrete servo motor
on the screen. Such an interface works reasonable well but it will take
many seconds until the robot arm can grasp objects.
The more advanced
interaction technique is to define a handful of keyframe and use the
mouse to browse through the keyframes. For example, keyframe #1 stands
for ungrasp, while keyframe #2 for grasp. So the human operator doesn't
controls the servo motors directly but he decides which system state he
prefers in a certain moment. 6d Bibliography [Geijtenbeek2012]
The
requirement to realize such advanced control system is a model. In the
model the keyframes aka body poses are stored so the human operator can
select one of the predefined IDs. The interaction with such a system is
trivial, because the human operator can decide on a higher task level
what action comes next.
6c1b Motion retrieval
The
concept of a sprite sheet was discussed already.6a Sprite sheet The
principle is used by game designer not from the beginning but frequently
to simply ingame sprite animation. The idea is to a PNG file with the
walk cycle and use this memory map in the game to animate the main
character.
The interesting situation is, that spite sheets can be
improved. The resulting motion retrieval system is used to query a mocap
database. 6d Bibliography [Sakamoto2004] Similar to a sprite sheet
there are some poses stored int he database, and then the software
searches for the next pose in the database. This allows to create
realistic motions with only a little amount cpu load.
The concept has
much in common with model based tracking. The idea is that the
underlying mocap database is the model and only body poses from this
database can be drawn to the screen. The reason why this technique is
highly efficient is because every pose has a unique number. Instead of
adjusting all the 17 dof joints of a skeleton the algorithm needs only
to know a single reference number. For example, the sequence [4,1,19,2]
is used to create a longer motion with the help of the underlying data
model.
6c2 Activity recognition but why?
In
the section 6c1 Model based tracking the idea of using a model to
recognize existing actions was introduced briefly. The question which
was left open is the reason why. Suppose it is possible to recognize
that a robot in a maze has hit the wall, has this fact a value?
The
reason why model based activity understanding is crucial in robotics is
because it helps to create an abstract mechanism.6a2 Abstraction
mechanism The idea is that there is a layer between a problem and the
computer. The layer itself which is the model has no importance but it
can be utilized for many purposes.
So the underlying problem can be
summarized as mapping or grounding and has to do with reducing the state
space of a problem. In the mentioned example with a robot which hits a
wall the original state space has to do with objects which can do do
something. There is a color, lots of pixels and endless amount of
events. Such a state space is far to complex to understand by a computer
and the only way for an Artificial Intelligence is to translate the
problem space first into a model.
The interaction with a simplified
model is much easier for a computer and can be realized in existing
paradigm. For example with a programming language. It is possible to
write down into a python program a statement like “if robot hits the
wall, then stop motor”. But the precondition is, that at first an
abstraction mechanism is available which allows to reduce the state
space to a small list of possible events.
6c2a Game log recording
Before
it is possible to program a certain sort of software a programmer has
to defined first the objective. Suppose the idea is to program an
activity recognition engine, where to start?
From a general
perspective it can be realized by recording a game log. What all the
game logs have in common is, that the keyframes are stored with a
timestamp in a directory. for example:
The filename (0-5)
represents the timestamp and the file itself holds the screenshot at
this moment. So the game log is basically a frame accurate video of the
game. Suppose a larger amount of such data were created, in the next
step the goal is to parse the information.
A keyframe contains of
possible actions, events and states. Each event has a unique number. For
example, event #2 means “robot collides with the wall in the maze”. The
compilation of all possible events, actions and states are stored in a
taxonomy as a 6b data driven models. That means, there is somewhere a
hierarchical table which holds all possible events.6d Bibliography
[Karpov2013]
The game log parser has the obligation to math the
taxonomy with the recorded gamelog. The result produces sense. That
means, the keyframes are grounded and annotated. There are many ways for
doing the matching. For example neural networks, but in the easiest
case a manual created python script can do the task manually. That
means, the classification module isn't trained by a learning algorithm,
but it is hand-crafted.
6d Bibliography
Baak,
Andreas, et al. "A data-driven approach for real-time full body pose
reconstruction from a depth camera." Consumer depth cameras for computer
vision. Springer, London, 2013. 71-98.
De Smedt, Quentin, et al.
"Shrec'17 track: 3d hand gesture recognition using a depth and skeletal
dataset." 3DOR-10th Eurographics Workshop on 3D Object Retrieval. 2017.
Engstrom,
Henrik, Jenny Brusk, and Patrik Erlandsson. "Prototyping tools for game
writers." The Computer Games Journal 7.3 (2018): 153-172.
Filippi, Hannes. "Wireless teleoperation of robotic arms." (2007).
Geijtenbeek,
Thomas, and Nicolas Pronost. "Interactive character animation using
simulated physics: A state‐of‐the‐art review." Computer graphics forum.
Vol. 31. No. 8. Oxford, UK: Blackwell Publishing Ltd, 2012.
Karpov,
Igor V., Jacob Schrum, and Risto Miikkulainen. "Believable bot
navigation via playback of human traces." Believable bots. Springer,
Berlin, Heidelberg, 2013. 151-170.
Kirk, James, Aaron Mininger, and
John Laird. "Learning task goals interactively with visual
demonstrations." Biologically Inspired Cognitive Architectures 18
(2016): 1-8.
Kopp, Oliver, Anita Armbruster, and Olaf Zimmermann.
"Markdown Architectural Decision Records: Format and Tool Support."
ZEUS. 2018.
Kovar, Lucas, Michael Gleicher, and Frédéric Pighin. "Motion graphs." ACM SIGGRAPH 2008 classes. 2008. 1-10.
Sakamoto,
Yasuhiko, Shigeru Kuriyama, and Toyohisa Kaneko. "Motion map:
image-based retrieval and segmentation of motion data." Proceedings of
the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation.
2004.
Webber, Bonnie, and Barbara Di Eugenio. "Free adjuncts in
natural language instructions." COLING 1990 Volume 2: Papers presented
to the 13th International Conference on Computational Linguistics. 1990.
Zhu, Zuyuan, and Huosheng Hu. "Robot learning from demonstration in robotic assembly: A survey." Robotics 7.2 (2018): 17.
No comments:
Post a Comment