June 13, 2018

Natural language interface for solving the frame-problem with a layered game-engine

Title: Natural language interface for solving the frame-problem with a layered game-engine

Author: Manuel Rodriguez

Date: 13. June 2018

Abstract: A planning algorithm like A* and RRT works only efficient for small problem spaces. Most robotics problems have a huge problem space. The answer to the mismatch is a domain model which is enriched with heuristics. In the early history of AI this problem is discussed under the term “frame problem” and means to describe the action model in a object-oriented programming language. For the example of an textadventure and the parking with a car, a game-engine is presented which is using natural language commands for storing domain specific knowledge.

Table of Contents

1 AI Planning
1.1 Knowledge based planning
1.2 The GIVE challenge
1.3 Automatic creating of a pddl file
1.4 Theorem proving for beginners
1.5 Storing domain knowledge in a game-engine
1.6 Plan stack in HTN-planning
1.7 Building a simulator for HTN-planning
1.8 Combining autonomous and remote-control
2 Example
2.1 HTN Planner
2.2 Car physics
References

1 AI Planning

Mindmap

1.1 Knowledge based planning

Artificial Intelligence in simple games like chess and more complex robotics domains can be realized with AI-planning:

“Planning is the most generic AI technique to generate intelligent behaviour for virtual actors.“ [8]

How a planning system looks like for a chess-like game is already known. It is a brute-force search technique in the game-tree. More complex games can be planned too, but the algorithm is more complicated. A combination of a hierarchical planning system together with machine-readable knowledge representation is the standard procedure. It means to store the game on different layers in a formal model so that it can be searched in real time. Well known forms of storing knowledge are STRIPS and PDDL. Both are languages for formulating a symbolic game engine. They are used for the high-level-layer of a planning domain.

But let us go a step backwards. The only known algorithm for solving a planning task is a search in the game tree. The algorithm is called A* or in newer literature RRT. RRT alone is not able to solve complex tasks, it must be implemented on different layers at the same time. The layers are depended from the game, they are representing the domain specific knowledge.

The bottleneck is, that for most games, no machine readable domain description is available. For example in a pick&place task for a robot, it is unknown how the story looks like and what the outcome of an action is. The programmer is not searching for a solving-strategy, instead he is searching for the game. That means, he must enrich the given game with detailed rules. Only these rules can be solved by the planner.

[12] calls the process “domain knowledge engineering” and describes a software (GIPO) which makes it easier to create a planning game from scratch. On page 4 an example is given. The original game is the “docker worker robot” and the programmer must define potential states of the robot like “load”, “unload”, “at-base”, “busy” and so forth to enrich the game with knowledge. The result is a pddl file which can be used in a planning task.

LfD & HTN

A human operator has an implicit domain model, he knows that before he can grasp an object, first the gripper has to be opened. This domain model is not available in sourcecode and has to be programmed first. To overcome the gap the "learning from demonstration" is the right choice for constructing a "hierarchical task network" from scratch.[3] [6] The human operator has a GUI in which he records a manual motion, and from this demonstration a machine readable ontology is created. This task model can be used by the HTN-planner for handling the task autonomously.

The aim is to transfer the knowledge from the human operator to the agent. The knowledge is similar to a walk through tutorial for games. It is a description of how to play a certain game. In a hierarchical task network the knowledge is formalized like a symbolic game engine. That is a software-modul which can predict future game states, e.g. robot-grasp-object -> object-isin-hand Usually the description is based on natural language. Instead of using simple variables in the gameengine like:

bool a,b,c
int d,e,f

the variables have sounding names like “bool object-isin-hand true/false” or “int distance-between-object-and-gripper”. The reason is, that the domain model is not primarily programmed for a machine but as help for the software-engineering-process. Creating a domain model is not a mathematical algorithm like A* but a software-engineering-task like UML and agile development. The result of this step is not a PDDL file or executable code, it is a human-readable paper which is called the specification of the game. The algorithm doesn't learn by itself the game, instead a version control system like git is used to bring the domain model to the next version.

Grounding means to combine tracking control with natural language.[9] The idea is not only to describe a task with “subtask1”, “subtask2” and so on, but with the correct words e.g. “heating the water”, “fill the cup”. It is not possible to describe a domain without using natural language. From a machine perspective perhaps, because all the words have no meaning for the computer, they are only labels. But for maintaining a task model by human engineers they need natural language. That means, a task model is foremost a dictionary and not a mathematical algorithm.

Language model

Under the assumption that every task model is based on natural language the question to investigate is how does the language model will look like for a certain domain? The GIVE challenge is trying to answer this by generating natural language instructions to guide a human for doing a task.[2] The idea is, that on the first hand, a dictionary is coded in computercode and can be executed like a game-engine, and the output of the engine is used to solve a task.

Perhaps an example will make the point clear. In [7, page 6] is on top of the page the map of a game visible. It is a normal maze game, in which the player moves around and can press button. Below the image the pddl description is given, which can be seen as a natural language game engine. It provides commands like “move”, “turn-left” and “manipulate-button”. The pddl description contains of two important aspects:

• natural language words, for example “turn-left” instead of a   simple “action2”
• state-action-pairs, that means by activating an action, the   system is in a new state

The overall system can be seen as a living dictionary. It contains one the first hand, words and action-names, and they can be executed by the user which brings the system to a new state. The pddl file contains knowledge about the domain.

1.2 The GIVE challenge

The term “Generating Instructions in Virtual Environments” (GIVE) is used for describing a programming challenge with the aim to generate natural language for a domain.[5] The output itself is usually produced by a PDDL solver, that means for a given current / goal pair the solver is trying to find a plan through the domain. The solver needs as a precondition the PDDL domain model.

A more colloquial description of the challenge is to compare it with a textadventure game which is enriched by a 3d map on top of the GUI. It is comparable to the early Point&click adventures in the 1990s in which the human-operator has some actions like “go north”, and must reach a certain point in the map while is doing subtasks. The interesting aspect in the GIVE-challenge is, that from a graphical point of view, everything is minimalist instead the idea is the task model and it's potential application to Artificial Intelligence. Another important aspect in the challenge is, that natural language is in the center of focus. That means, the idea is not only to play a game by an agent, but instead the idea is to generate natural language which guides a human who is already capable of understanding English.

From a programming point of view, the GIVE challenge is one of easier tasks. It needs less energy to solve the challenge compared with Starcraft AI or Robocup. The task is not so easy as programming a normal 2d computer game, but it can be mastered by beginners in AI.

1.3 Automatic creating of a pddl file

A domain model consists of natural language. From a technical side, the domain model is stored in a symbolic planning language like PDDL, OPL or ABPL. The first one (PDDL) is a classical language, while the second others have object-oriented features. The easiest way for creating a domain model is program the domain-description by hand. It is the same technique like a textadventure is created. A more sophisticated idea is to create the language model automatically from event logs.[13]

The idea is to record all the event in a section called “agent memory” and then construct out of the relationships the pddl / ABPL description. That means, at the beginning the agent has no domain model, he has to build one from scratch while he gets new experiences. In the literature the term “action model learning” is used. An action model is a symbolic game-engine which can be stored in a PDDL file.

Instead of explaining how to realize such a system, at first we must describe how to evaluate an action model. At first, we need a working symbolic game engine, for example an instance of an textadventure. This game-engine produces a stream of natural language. The user can input text, and the dialogue system gives feedback also in natural language. On the right screen there is an empty prototype. The prototype has the obligation to emulate the working game-engine. That means, the prototype observers the events, stores everything and after a while he is acting in the same way. The goal is to reverse engineer a symbolic game-engine.

1.4 Theorem proving for beginners

In the early history of Artificial Intelligence, theorem proving played an important role. In the context of the STRIPS planning system, such systems were capable to prove mathematical questions. But what is theorem proving exactly? Why it is so hard?

Theorem proving means basically to create a puzzle, for example Rubik's cube, and search for a sequence. The cube has a starting pattern, certain operations are possible and the goal is to bring the system into a goal situation. For example, make all sides clear, or bring only one side into a healthy condition.

Mathematical theorem proving works with the same idea in mind. There is a starting equation, a number of allowed operations and a goal situation. Like in the Rubik's cube example the idea is to search for plan, if such a sequence was found, the theorem was proven. In reality, theorem proving is equal to game-playing. A game is system which has allowed moves and it is up to the player to decide which moves he want's to execute. Automatically theorem proving works surprisingly simple. The so called SAT solver is using brute-force-search and that's all. If the problem is small like in the rubik's cube example, the STRIPS program is successful, in much higher state-space for example “a theorem prover for chess” is much more difficult to realize, because the number of possible plans is higher.

In the historic paper [4] of 1971, Nilsson introduces the so called “frame problem”. This means basically, that the STRIPS language is only a simple planning language and has no object-oriented feature for describing more complex problems. More recent planning languages like “A Better Planning Language” (ABPL) can overcome the frame problem.

Example

From school mathematics there are equations known plus rules which can used onto these equations:

a+4=7
a+4=7 |-4
a=7-4
a=3

The starting situation was an equation, and we have applied an operator to it. After the action, the equation is in a new condition. In the example, we selected the action manually, but it is also possible to formulate the problem in the STRIPS language. Such feature is integrated in most computer-algebra systems. The so called solver is playing around with the equation to fulfill a certain condition.

The “frame problem”

With STRIPS is a powerful planning language available for proving any theorem. The problem is now: how to formulate a robotics-problem in the STRIPS syntax? The question is discussed in the literature as the frame problem, because the assumption is, that frame based aka object-oriented programming is part of the solution.

A more precise formulation of the problem is given under the term “General game playing”. The challenge is here to invent a game from scratch. That means, as input the system gets a plan trace of checkers, or a textadvanture, and the system is able to construct all the game-rules and codes them into the Game description language. The sad news is, that until now “General game playing” didn't work very well. It is not possible to construct real games from scratch. But that is not a real bottleneck, because it is always possible to manually program in Strips, GDL or any other language. It is not necessary to use automatic programming for realizing a robot. 

In reality, the frame problem is equal to a software-crisis. That describes a situation in which a demand for software is there but no sourcecode is available because of many reasons. The software crisis in the area of operating systems was solved with Open Source software, and the software crisis for the special domain of game-playing and planning languages will be solved with Open Science. For example, if programmer A describes in a paper an UML chart, a ontology and executable code for implementing a pick&place robot, than programmer B is able to reproduce the result and use this as a basis for a more sophisticated system.

The frame problem, the grounding problem and the general game playing challenge can all be solved with a better science communication which works manually and is working with Open Access papers and Open Source software.

1.5 Storing domain knowledge in a game-engine

Domain knowledge has to be stored in machine readable form. Most literature questioned how exactly the data-storage should be, for example in the PDDL format, in ontologies or in semantic networks. More important is the question how the result will look like if the domain knowledge is available. It will look like the game-engine of a textadventure. That means, it is possible to send a command to the engine, and the engine will output the future state.

A symbolic game engine which is controlled by natural language is able to predict future states. For example, a command like “grasp apple” results into the output “apple is in hand”. This logical reasoning has to implemented in a part of the software called game engine. A game engine can be programmed in a textadventure markup language, in Javascript and even with ontologies. In the easiest form a textadventure is programmed in Python with object-oriented feature. But in general it is only a minor problem, more important is the information that domain knowledge and a working game-engine is equal.

Let us describe what in the so called ICAPS conference is usually done. In most cases a domain like Blocksworld is converted into a pddl description. But that is not what the inner goal is. The more precise description is, that a textadventure for the blocksworld domain was created, and apart from PDDL this can be done in any other programming language. At the end, a game-engine must be implemented which can be filled with user commands.

But why is a textadventure so important, isn't it possible to write a normal game-engine with a graphical interface? The terms grounding means to connect natural language with actions. Grounding means, that the engine can parse a command like “grasp apple”. Every grounding results into a textadventure, because a textadventure is about the understanding of natural language. The only open question is, how to program such adventure for a certain domain.

Existing textadventure which were programmed since the 1980s contains domain knowledge in a machine readable form. They have a clean interface, it is possible to send a request to it, and the game-engine calculates the follow state. A textadventure can be seen as the inner core of a working robot control system. It is the part in which the domain knowledge is stored.

In the context of Artificial Intelligence the general name for domain knowledge which is stored in a textadventure is “dialogue system”. Dialogue system like TRAINS and FrOz are usually programmed to support the human player. They have a planning feature out of the box, that means, the engine was programmed with the goal to find a path through it by a solver.[1]

1.6 Plan stack in HTN-planning

[10] describes on page 8 the plan-stack of a HTN-planning system. A so called plan stack is a list of high-level commands:

1. task1
2. task2
3. task3

and so forth. The term “stack” is correct but it is a bit misleading, because the datastructure of storing a plan is not very important. Any other data structure for example a SQL-table, a csv-file and so on would also work great. The more important aspect is, that before building the plan stack the programmer needs to know the name of the tasks. In the cited example, the tasks are having to do with the soccer domain. Their names represents elements of the soccer game, for example, “pass-the-ball” or “shoot”. The knowledge is not stored in the HTN-planner directly, but in the domain-model which is used by the HTN-planner.

I would guess that the stack and the HTN-solver are the least important part of the AI-system. That means, to store a list of task-names in a computer-memory is a trivial programming task and can be implemented easily. The bottleneck in Hierarchical task networks is somewhere else. Like i mentioned above it is the domain-model. In general we can say, that the domain model is equal to a high-level game-engine. It is some kind of textadventure. The user has commands which he can send to the game-engine and the engine is executing them. The game-engine has the obligation to predict future game-states. That means, after the command “pass-the-ball”, the game engine changes the position of the ball to the receiver of the ball.

Basically spoken, the domain-model of a HTN-planner is equal to a computergame. Because of it's high-level-nature it is often realized as a textadventure but can have additional graphics to visualize the internal states. Let us go into the details for the Robocup case. A game for playing Robocup can be programmed on many levels. At first it is possible to play the game physical with real robots, while in the 3D Simulation league only a physics engine is used. In the 2D Simulation a different kind of physics engine (perhaps a 2d one) is in the loop, and it is also possible to simplify the engine further to a more abstract domain model, which is working without a realistic engine but only with a idealized physics engine.

For example, it is possible to use the SFML graphic library to program a soccer game which is not very accurate. The game is not realistic but is similar to a Pacman clone. That means, the players have limited actions possibilities and the simulation would look like a Atari 2600 game. The trick with a htn-planner is, to combine different game-engines in layers. For example on top a high-level textadventure, in the middle a graphical representation and on the bottom a 3d realistics physics engine. Optimizing these different layers is the key factor for a successful HTN-planner.

Model acquisition

Surprisingly many papers are discussing the automatic creating of action models for HTN-planning. The idea is to use plan traces and a very complicated algorithm which generates the action model. To be honest, automatic generation isn't working. If the aim is to get a real system for example to play a game, only handcrafted models can be used. Realizing an action model for a HTN-planner has to be done as programming in general. That means, that lots of man years has to be invested, and some kind of version control system is needed. That means, the action model can not be generated autonomously, it has to be programmed by man.

There are many working examples available for example from the gaming-industry or from the Robocup challenge. In all cases the workflow was hand-crafted. That means, a team of 10 programmers has written the documentation, they have painted UML charts and they have implemented the action model in a simulator. The process can be called a “Software engineering task”. Plan traces may be useful but there is no algorithm but only a project which can transform them into a executable action model.

What can be done automatically is to use a given action model with an automatic solver. If it is clear that the task contains the subtasks “pass” and “shoot” and if it is clear, what the follow state is, that an automatic solver is able to generate the best plan. It is the same strategy which is used in computer chess to generate the next move. It is done entirely by the computer and human intervention is not needed.

What is possible, is to use an existing serious game (which has an API) and run on top a HTN-planner. If the game contains the action model, the tasks and the events it is possible to calculate the plan. The programming effort is only minimal in such cases. But, here the programming effort has to be invested by somebody else who has programmed the original game. That means, the action model is not generated from scratch it was programmed to in an external software engineering project.

1.7 Building a simulator for HTN-planning

HTN-Planning itself is easy to understand: a given “action model” is sampled by a solver and the found plan brings the system into the goal state. The more demanding task is to create the action model. Let us describe in detail how does it look like.

On a programming level every “Action model” is realized with an ontology. That means there are some C++ classes which are containing methods and attributes. They can call each other. The purpose of the ontology is to realize a simulator, that is a piece of software which is mimicry the reality. A well known simulator which is used in the Robocup domain is a physics-realistic simulator. It is realized with a dedicated physics engine and calculates what will happen if the player kicks the ball. But a physics-simulation is not enough, in the context of hierarchical task networks, many layers of simulators are combined together. There is a need also for a high-level-simulator, a tactical simulator and so on. All of these software moduls can be realized with ontologies, better known as UML classes. The question is only how to transfer a certain domain for example a dexterous grasping task into a simulator.

Again, let us imagine what the benefits are. Suppose we have a layered simulator for a dexterous grasping task. On the lowlevel it is a physics engine, on the mid-level a simplified 2d game and on the high-level layer a textadventure which can be controlled in natural language. If such a detailed model is available, it is very easy to solve the game. All what the HTN-planner has to do is generate random actions on different levels and search for a certain situation which is called the goal. The only open question is, that for most domains, such a layered simulator isn't available. And without such system, the HTN-planner has nothing to do.

The open question is: how to program for a certain domain, a layered simulator in an object oriented programming language. Programming only a realistic physics engine is not enough, what a HTN-solver needs is a mixture of different simulators because this will speed up the search process. Let us make an example for the Robocup domain.

In the high-level-simulator we can execute a command like “move-to-ball”. After executing the command, the engine places the player direct to the ball. The cpu-consumption for doing so is exactly zero. That means, the game engine simply moves the player position and that is. There is no collision detection, no path planning and no check if the battery is empty. On a second layer, this command is resolved into detail commands which are more realistic and only on the lowlevel layer a realistic 3d physics engine which includes collision detection is needed. Such a layered ontology based simulator can be called the main part of a HTN-planner. It is a useful tool for generating a plan for an agent.

layered abstraction

Because the layer architecture is fundamental for HTN-planning, I want to a give an example. Let us suppose we are again in the Robocup domain. A normal simulator would implement a physics engine in 3D. In HTN-planning this is not enough. The idea is to search for a plan on different abstraction levels and especially on layers which are computation inexpensive. A 3d accurate physics-simulation is very cpu-demanding, it is the worst choice for testing out random plans in it. The better approach is to construct first an artificial game on top of the simulator. We can call this a simplified soccer simulator. It is only 2d based and has no dedicated physics engine. Instead it works like a board game. That means, the pieces on the table can be moved freely without any constraint. If agent #1 should go to a certain place we can manipulate his position directly. That means, the action “move” needs no cpu-consumption instead it is executed directly. On this simplified board game, it is possible to figure out basic strategy. For example, if we want that 3 players are on the same place, we must move all them to that place. The abstract game engine can answer certain question. All of them are abstract nature. It is a game on top of the original game.

Now we can combine both layers together. On top the 2d abstract game with a simplified mechanic and on bottom the elaborated 3d simulator which simulates an accurate physics engine. Before we will ever try out a move in the 3d simulator we are asking first the high-level-layer what the result will be. The overall domain must be implemented in at least 2 layers of different game engines. That is the basic concept of an action model behind a HTN-planner.

The high-level layer didn't replace the former 3d physics engine. An accurate physics engine is great for planning lowlevel actions, for example if we need to know how strong we must kick the ball until he reaches a certain velocity there is no alternative to a physics engine. It gives the exact result back. But, only a small amount of question of an autonomous agent has to deal with this detail question. In many other cases the agent needs advice if he should kick the ball to position A or B, no matter how exactly this can be done. This high-level-decision can't be answered by an accurate physics engine.

In reality, a domain is modelled by different game engines which are arranged in layers. At least 2 layer are needed, but more layers are better. This raises the problem of how exactly the different layer-engines has to be implemented. Manual programming is the only working choice. Because the question is similar to any other software-engineering project. As an input we have a system specification, for example “program a software modul which simulates strategy aspects of a soccer play”, and the result is after some man years working code which fulfills the specification.

The concept of “Action model learning” is discussed sometimes in the literature, but it is only a future vision not a technology which is available today. The hope, that an action model can be derived from plan-traces without handwritten code will be disappointed.

1.8 Combining autonomous and remote-control

It is obvious, that writing a software-package which controls a robot is possible. Because, a robot is a machine which can be driven by commands, and a software can generate these commands. It is only a detail question, how exactly such a program will look like. In real robotics projects this is the bottleneck number one. That means, on a theoretical level it is often clear, what the robot should do, but the users are not good in programming or have not enough time to realize the software itself.

A possible answer is a semi-autonomous control system. The human operator takes first a joystick for controlling an underwater vehicle, and only in the second step the robot is following the demonstration. [11] The system described in the paper is at first hand a teleoperated underwater vehicle. That is according to the definition not a real robot, but it is more a remote controlled toy-boat. The advantage is, that writing the software for a human-in-the-loop simulation is relatively easy. The second part of the project (autonomous control) is postponed to the step #2. That means, even this subproject fail, it is always possible for a human to control the system manually.

The second advantage of this step wise programming is, that a concept like “Learning from demonstration” answers the question, how the Artificial Intelligence will work. The task can be specified under the term tracking. That means, the robot has the obligation to reproduce the human task. Implementing this task in software is not easy, but it is possible. In the above cited paper the authors are using the PDDL language for the high-level-planner and DMPs (parametrized dynamic movement primitive) for the lowlevel part.

But let us go into the details. The boat can be controlled manually. That means the human operator has a joystick and lots of keys he can press, and the boat has a certain task, for example “docking”. The process is the same like playing a computergame, that means the operator must press some keys, the robot reacts and the water in the tank increases the difficulty because the fluid produces disturbances. It needs usually more then one attempts until the human operator has mastered the task.

The strategy of the autonomous system can be sloppy described as Decision support system. That means, the AI is not really capable of controlling the robot, instead it is a support for the human operator. Sometimes the concept is called supervised autonomy, because the human operator is always in the loop and must observe what the robot is doing. How exactly works the system? It is mostly a controller. That is a piece of software which generates control-signal for the robot. The controller consists of a lowlevel and a high-level part. It is similar to a GOAP-planner which is used in computergames for controlling non-player-characters but is integrated in the overall robot-monitoring-system. The basic idea behind a controller is a planning task. That means, from the current situation a plan is generated to bring the system into a goal state. It is the same principle like a chess-engine works. There is a game-tree, different branches and an evaluation criteria. The difference is, that a robot-controller is more complicated then a chess-playing software. It contains more submoduls which are doing the hierarchical planning process.

Another interesting feature of mixed teleoperation is, that from the software-engineering side the project is relatively easy. That means, a teleoperated controlled robot is mostly a hardware problem which is understood very well. If hardware is available like a robotarm, a camera, an object and a joystick all the devices has to be connected and the system is ready. The joystick can be replaced by a data-glove and the manipulator by a larger model. That means, it is not necessary to write complicated control software for the task or have a theoretical understanding of robotics to pickup the ball. The main idea is to postpone the complicated task of writing the software to later state, namely for the Learning from demonstration. It is mostly the result of playing around with the system and try to improve their functionality a bit. Or to make the point clear: it is complicated to fail a telerobotics project. The reason why has to do, that a human-operator is capable of executing nearly all task. Remote controlling a machine is a robust way of interaction with the environment.

Suppose the manual teleoperation works, what is the next step? The next more advanced form is to utilize a pddl planner in the loop. A task model is formulated in the pddl language and this calculates some decision in realtime. Such a system is not a real robot system, because pddl is only a basic form of planning, but it is a good transition from a pure human-controlled system into a semi-autonomous system.

2 Example

2.1 HTN Planner

The difference between a Hierarchical task network and a behavior tree is not easy to understand. Both concepts are about Artificial Intelligence and Game-programming, but in general a HTN-planner is superior. Perhaps a simple example in sourcecode makes sense to grasp the idea.

The figure [fig:Textadventure] shows a compact C++ class which implements a textadventure. The user can send to the engine different commands like “init”, “open-door” and so on. After the command is parsed, the engine changes internal variables. In the concrete example, the user must first open the door until he can enter the room. The C++ class is not a behavior tree which says which commands must be executed in sequence, instead it is a game-engine who accepts commands and prints out the internal state. The user can play around with the game and send different commands to the engine in the hope, that he will reach the goal. It is not the HTN planner itself, but the domain model which can be used by a HTN-planner. The question is: which commands must be executed to bring the system into a goal-state.

class Textadventure {
public:
  std::string door;
  std::string position;
  void action(std::string name) {
    std::cout<
    if (name=="open-door") door="open";
    if (name=="close-door") door="close";
    if (name=="go-in") {
      if (door=="open") 
        position="inside-room";  
    }
    if (name=="go-out") {
      if (door=="open") 
        position="outside-room";  
    }
    if (name=="init") {
      door="close";
      position="outside-room";
    }
    if (name=="show") {
      std::cout<<"* ";
      std::cout<
      std::cout<<"\n";
    }
  }
  Textadventure() {
    action("init");
    action("show");
    action("go-in");
    action("show");
    action("open-door");
    action("show");
    action("go-in");
    action("show");
  }
};


Textadventure visual

The interesting aspect is the hierarchy of actions. On the lower-level the player can open and close the door and on the high-level layer he can enter and leave the room. Let us make a practical example. The game starts with a fresh instance and the agent want's to enter the room. He types in “action("go-in");” but it doesn't work, because the game-engine prevents that the player can direct manipulate his position. Instead there is a build-in game-mechanics, that means on the low-level the player must first open the door until he can execute the high-level “go-in” command.

The problem is not to solve the game automatically, this can be done by a brute-force sampler in under a second. The problem is to describe the domain in a machine-readable form. That means to program the game-engines on the lower-level and on the higher-level. A game-engine is a software-modul which specifies what will happen if the agent executes an action.

2.2 Car physics

Suppose we want to program a car parking controller, what is the best practice method? At first, we need a simulator with a top down physics engine. The idea is not to use a real car, but testing out the controller in a computer-game. A realistic physics engine is also helpful because it allows to detect collisions. But there is only one problem: a working simulator isn't equal to a working controller. A simulator means only, that we can control the car with a joystick, but the aim was to get an autonomous car.

Let us investigate what the problem is. If the car is visible in the simulator we can test out different plans, for example a trajectory for parking. The question is: how does look the trajectory for getting a certain goal? This is usually answered with a brute-force sampling planner, that means we are testing out 1 million trajectories and evaluate what will happen. So we can generate the parking trajectory like a chess engine works. There is only a minor problem. The CPU-consumption would be very high. Even if we are taken a modern efficient physics engine like Box2d or bullet, a normal computer is not able to evaluate more then 100 trajectories in a second. That means, it is not possible for testing out 100 million trajectories, it would take years for doing so.

Car-parking-scene


The answer to the problem is called “hierarchical task network”. It is a planning technique which constructs a layered simulator and planning on different levels. Let us make an example. Figure [fig:Car-parking-scene] contains three scenes of a parking maneuver. The standard programming technique to implement the game is a realistic physics engine, because it is highly accurate. The alternative is to program a simplified physics engine from scratch. This engine doesn't have a collision check or sophisticated force calculation. Instead it works on a symbolic level.

The game starts with the init-keyframe. That means, there is a car and a parking lot. The user has different commands he can enter. The command “u-turn” does not activate a complex AI-controller which is doing the u-turn maneuver with a detailed trajectory. No, “u-turn” means only to change the direction of the car direct. That means, in the game the position of the lamps of the car will be changed to the new position, that is all. If the user enters “u-turn” the game-engine simply flips the car. The next command the user can enter is “parking”. Like in the example before, it is a very basic command, if the user enters “parking” the position of the car is simply changed to the parking lot. That physics engine works in a reduced form, it is similar to a pddl specification.

As a consequence we have a car-parking game, but a very abstract kind of game. Playing around with the game isn't generating the detailed trajectory, but only the subgoals. It is a way of storing knowledge and to enable high-level-planning. Let us now investigate how “parking” works. The user enters two commands: “u-turn”, “parking”. That's all. With the first command he flips his car, and with the second command he moves the car into the lot.

Until now the idea may look a bit useless, because we need a detailed motion controller and not a subgoal generator. But the described abstract game is an important step into this direction. We can use the high-level-game for constructing the low level controller. If we know, what the subgoal is, we can do the planning on the realistic physics engine. The question is not longer: what is the overall trajectory, the question is only “how to realize a u-turn”? That means, the high-level commands are equal to skill primitive. Calculating the correct trajectory for reaching out these skills is easier then planning the complete domain. It is possible to use a realistic physics engine like Box2D for calculating such subgoals.

References

[1] Luciana Benotti, "DRINK ME: Handling actions through planning in a text game adventure", XI ESSLLI Student Session  (2006), pp. 160--172. https://cs.famaf.unc.edu.ar/~luciana/content/papers/files/benotti06.pdf

[2] Donna Byron, Alexander Koller, Kristina Striegnitz, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander, "Report on the first NLG challenge on generating instructions in virtual environments (GIVE)", in Proceedings of the 12th european workshop on natural language generation (, 2009), pp. 165--173. http://www.aclweb.org/anthology/W09-0628

[3] Aaron St Clair, Carl Saldanha, Adrian Boteanu, and Sonia Chernova, "Interactive hierarchical task learning via crowdsourcing for robot adaptability", in Refereed workshop Planning for Human-Robot Interaction: Shared Autonomy and Collaborative Robotics at Robotics: Science and Sys… (, 2016). https://people.csail.mit.edu/cdarpino/RSS2016WorkshopHRcolla/abstracts/RSS16WS_17_InteractiveHierarchicalTask.pdf

[4] Richard E Fikes and Nils J Nilsson, "STRIPS: A new approach to the application of theorem proving to problem solving", Artificial intelligence  2, 3-4 (1971), pp. 189--208. http://ai.stanford.edu/~nilsson/OnlinePubs-Nils/PublishedPapers/strips.pdf

[5] Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz, "The GIVE-2 Corpus of Giving Instructions in Virtual Environments.", in LREC (, 2010). http://cs.union.edu/~striegnk/papers/striegnitz_conference_lrec_2010.pdf

[6] Andrew Garland and Neal Lesh, "Learning hierarchical task models by demonstration", Mitsubishi Electric Research Laboratory (MERL), USA--(January 2002)  (2003). http://www.merl.com/publications/docs/TR2002-04.pdf

[7] Alexander Koller and Ronald Petrick, "Experiences with planning for natural language generation", Computational Intelligence  27, 1 (2011), pp. 23--40. http://www.coli.uni-saarland.de/~koller/papers/ci-crisp-11.pdf

[8] Miguel Lozano, Steven J Mead, Marc Cavazza, and Fred Charles, "Search-based planning for character animation", in 2nd International Conference on Application and Development of Computer Games (, 2003). https://pdfs.semanticscholar.org/cba3/a6527f7a4e7209b463cc70bc300cda01b171.pdf

[9] Dipendra K Misra, Jaeyong Sung, Kevin Lee, and Ashutosh Saxena, "Tell me dave: Contextsensitive grounding of natural language to mobile manipulation instructions", in in RSS (, 2014). https://www.cs.stanford.edu/people/asaxena/papers/misra_sung_saxena_rss14_tellmedave.pdf

[10] Oliver Obst, Anita Maas, and Joschka Boedecker, "HTN planning for flexible coordination of multiagent team behavior", Fachberichte Informatik  (2005), pp. 3--2005. ftp://ftp.uni-koblenz.de/pub/outgoing/Reports/RR-3-2005.pdf

[11] Narćıs Palomeras, Arnau Carrera, Natàlia Hurtós, George C Karras, Charalampos P Bechlioulis, Michael Cashmore, Daniele Magazze…, "Toward persistent autonomous intervention in a subsea panel", Autonomous Robots  40, 7 (2016), pp. 1279--1306. https://www.researchgate.net/profile/Charalampos_Bechlioulis/publication/283339628_Toward_persistent_autonomous_intervention_in_a_subsea_panel/links/587574d108ae8fce492823bc/Toward-persistent-autonomous-intervention-in-a-subsea-panel.pdf

[12] RM Simpson and W Zhao, "Gipo graphical interface for planning with objects", International Competition on Knowledge Engineering for Planning and Scheduling  (2005), pp. 34--41. https://pdfs.semanticscholar.org/5ef6/d24c8f496963a40b4939b5f3e012c977edca.pdf

[13] Qingxiaoyang Zhu, Vittorio Perera, Mirko Wächter, Tamim Asfour, and Manuela Veloso, "Autonomous narration of humanoid robot kitchen task experience", in Humanoid Robotics (Humanoids), 2017 IEEE-RAS 17th International Conference on (, 2017), pp. 390--397. http://h2t.anthropomatik.kit.edu/pdf/Zhu2017.pdf