Robotics and Artificial Intelligence: August 2019

August 31, 2019

Defining the core application of neural networks

A common problem in neural networks is, that the newbies are not sure, for what purpose they can utilize the technology in a meaningful way. The reason is, that neural networks are usually described from it's own perspective which has to do with a certain neuron layout, the weights, a training algorithm or a certain software framework which can be programmed in Python or Java. But this description doesn't answer the question what the meaning of neural networks is.

With a bit research in the existing papers it's possible to simplify neural networks to it's core feature which is called “regression analysis”. A regression analysis is usually done with the SPSS statistics package. The idea, that a dataset contains of input and output variables, and the mathematical model is able to predict the outcome for interpolated input values. The most simple form of a linear regression model can be realized in MS-Excel and here, the model contains of a simple line which goes through existing datapoints. Let me give an example:

In the MS-Excel sheet the dataset: 2,4,6,8,10 is given. If we are visualizing the information in a plot and draw a line through the dots, it's possible to interpolate the missing point. The model can answer the question what the output for new input signal will be. It can predict, that between 4 and 6 the number 5 is there and that on the right side, the next datapoint will become 12.

This kind of regression anaylsis can be solved with neural networks very well. Much better than with MS-Excel or SPSS. The advantage is, that neural networks fit to nonlinear data with more than a single input variable. The training process means, that the existing dataset is converted into the model. Then the neural network is able to predict new output values for unseen input values. It's some kind of advanced interpolation mechanism.

This understanding of neural networks is important because in the task “using neural networks for regression analysis” no or only a few problems will become obvious. That means, no matter which kind of dataset is given, the neural network is able to do the regression task for all of them. This is some kind of core functionality, which fits well to what neural networks can provide to the user.

Sometimes, neural networks are used to do other task outside of regression analysis. And in this domain the most problems become obvious. It will become unclear, how to do the task exactly, and if the model was trained it doesn't work to fulfill the needs of the user.

What can we learn from node.js and golang?

Every programming language which is developed from scratch is telling a story to the audience. In case of mainstream languages like C++ and Python the plot is well known. C++ is the defacto standard for creating super efficient software, while Python can be utilized for prototyping new sourcecode. The story of some recent languages like node.js and golang is not told very often but they have contributed a lot to the overall domain of computing.

Node.js and golang have become famous, because they support the idea of creating a RESTful API. RESTful is a network interface to connect existing programming infrastructure. It solves basically the problem how the different coding standards like C++, Java, Python, PHP and Lua can communicate to each other without convince the other side to switch the programming language.

To understand why this is important it make sense to describe the typical programming dialect controversy. The standard problem for newbies is, that there are around 50 different languages available and it's not possible to become familiar with all of them. So he have to pick one and hope that his language of choice will solve it's problem. Because most programmers have chosen a different language, there is a need to compare the language which results into a question like “is C++ or C# the better language?”. The problem is, that even somebody is able to answer the question he will fail in predicting the future development. It's unlikely, that C++ is able to repress C#, because both languages have a certain niche.

Instead the number of languages has grown rapidly, In contrast to the homecomputer scene in the 1980s which evolves into a single standard, called the IBM PC, there is no sign, that a single programming language will dominate all other. From the aspect of performance, C++ is often called the queen in programming, but the C++ is not strong enough to convince the other communities to switch to the language.

And exactly this is the reason why a network protocol like RESTful is an option. It helps to connect all the different languages into a single standard. The most interesting feature of RESTful is, that no one is forced into a single programming language, but it's ok to combine different languages. That means, user1 is prefering Windows 10 together with C#, user2 has chosen Ubuntu with Python, user3 has installed objective-c on a Mac OS X system and all the written code is able to talk to each other over the RESTful interface.

Sure, it's possible to create a REST server without node.js and without golang as well. Most languages like Java or Python are providing a library for this reason. But node.js and go have introduced the concept first. And that is the story they are selling to the public. And it make sense to follow the idea.

Literature

In the manual about the “8th programming language”, there is short notice about using RESTful words, https://8th-dev.com/manual.pdf (page 58)

Motivation

Why it make sense to send RESTful json formatted data over a network instead of using a single programming language for all purposes? To answer this problem we have to understand what make C# unique from Python, and what is the difference between C++ and Javascript. All of these languages were developed with a certain need. In case of C++ the motivation is well documented, because C++ is one of oldest languages available. The idea behind C++ was, that the user gets the maximum performance of a compiled language. In theory, everything can be written in C++ and the language is used today very often.

But if C++ is so great, why is the C# alternative available? The needs of C# are different from C++. C# was developed by the Microsoft company as a centralized platform for running applications. From the perspective of Microsoft it make sense to promote C# over C++. And this results into a paradox situation. It make sense to program in C# and C++ at the same time. Which results into complex infrastructure in which community A has written wonderful libraries and compilers, but community B has written different libraries and compilers.

But this is not all. Additionally to C# and C++ there is also space for another programming language called Python. The Python VM works different from the C# one, and brings it own libraries and documentation. It is easier to build different programming communities over convincing the existing programmers to stay in the same ecosystem. That means, Python programmers are not interested in writing C# libraries and C# compiler developers have no motivation to contribute to the upcoming C++20 standard. The prediction is, that his kind of heterogeneous infrastructure will grow in the future. That means, in the next 10 years, new programming languages will upraise but existing one will become used more often as well.

An inofficial clearing house which mirrors the existing programming ecosystem is https://repl.it/ It's a website which allows to run sourcecode from different programming languages. Today, around 60 languages are supported. It starts with some oldies like BASIC, Forth and APL, goes over mainstream languages like C++, C#, Java, Python and Javascript and supports also specialized languages like F#, NodeJS, Go and Haskel.

Many important languages like Matlab and Assembly language are missing on the list, but it shows very well what programming is about. Not a single language but around 50 different languages are used in reality. The problem is, that it's not possible to delete some entries from the list, because each of them fulfills a certain need. The only working mode which make sense is to add more languages to the list. This will increase the confusion and makes it harder to unify existing source code infrastructure.

REST in Assembly language

As a proof of concept it might be interesting to write a RESTful server in Assembly language. Not because it's superior to the alternative written in C, but to demonstrate, how to interconnect assembly language programs with high level programs.

How can an assembly program and a Python interact without RESTful? There is no way in doing so, because both languages are quite different. They have no common standard. Python can't execute assembly, while assembly can't parse python code. None of them is wrong. Assembly language is a here to stay but Python too. The problem is, that in general it's not possible for two languages to interact with each other. The same problem is there between Turbo Pascal and C, between Java and Lua, and between AutoIt and C++. The problem get more complicated, if the sourcecode runs under different operating systems. For example, the Java programs runs on a smartphone while the server database runs on a Linux headless server.

August 26, 2019

Testing the performance of RESTful API with Flask

On the first look, the idea to use a network sockets for communicating between different programs sounds not very common. Instead the literature recommends to use language bindings like SWIG, Python ctypes to use an existing C library from within Python. But suppose the idea is to use as middle layer the network protocol, how well is the performance if all the data are send over the network card to the localhost device?

At first, we need a running flask server plus a python client who gets access to the data.

# server.py
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
    return "Hallo Welt"
if __name__ == "__main__":
    app.run()

# client.py
import requests, time
for i in range(1000):
  res = requests.get("http://127.0.0.1:5000/")
  print(i,res,res.text)
  time.sleep(1)

After running the server.py it's possible to retrieve the information from a webbrowser or by starting the curl command on the command line “curl -v http://127.0.0.1:5000/”. In both cases, the response is a simple hello world which is send as a network message.

The next step is to make a small performance test. We can remove the “time.sleep(1)” statement which means, that the for loop of the client is trying to retrieve the information in the shortest amount of time. On my standard notebook the measured time was 4.8 seconds for 1000 requests. This is equal 208.3 requests per seconds. Only to get the numbers right, a normal computer game runs with 30 fps, which means that in a single seconds 30 requests are answered by the game engine. 208.3 requests per second from the flask server is equal to 200 frames per second, which is fast enough for most games.

It's important to notice, that this speed was reached not over a wired connection but with the localhost interface. It measures the time, to communicate between two local apps on the same computer.

The conclusion is, that FLASK and RESTful in general which is available for C++ as well, is fast enough as a communication interface between different applications. What does that mean for programmers? It means, that it will become very easy to connect different programming languages together. No matter if the sourcecode was written in Python, C++, C#, Java, Go, plan C, Forth or whatever. With the help of a RESTful API it's possible to send data between all the programs.

Details
If the server.py runs in the background, every request is written to the commandline:

127.0.0.1 - - [25/Aug/2019 16:35:54] "GET / HTTP/1.1" 200

Which means, that the server.py FLASK instance has received a request and delivered the “Hello world” message back. On the other hand, the client.py application produces a load on the server:

0  Hallo Welt
1  Hallo Welt
2  Hallo Welt

It sends a GET request, and retrieves the textual information in exchange. Why is this something which is important? Because the normal case is, that different programming languages can't communicate to each other. It's hard embedded Python Code in a C application, or to transmit from a Java application a string to a C++ library. In some cases wrappers are available, but these tools are not working very robust and the common understanding is, that the problem is located within a certain programming language. That means, the Python language has to be blamed, because it's not possible to import an existing C++ library.

The problem is not located within a certain programming language, because apart from Python all the other languages like C++, Java or C# have the same problem to communicate with sourcecode written in a different language. The problem is, that an individual language like C# has a certain environment. For example it comes with a virtual machine, some preprogrammed code and a pipeline how to convert the sourcecode into a binary file. This workflow makes it hard for a different programming to get access.

What most programmers are doing is to stay within a single language. They are using Python or C++ for creating an application but it's not allowed to mix both languages. And exactly this problem can be solved with FLASK like RESTful services. On the higher instance of a localhost it's possible to communicate between different programming languages back and forth. The only precondition is, that a certain language is able to send data with the RESTful standard.

Speed up with crow and C++
RESTful service are available outside the Python ecosystem as well. In the C++ domain, the crow package is trying to imitate the FLASK library. The sourcecode for running a simple hello world server has to be compiled with the -I option which includes the path to the crow include directory. The client software is the same like in the FLASK example, except that the URL has a different port number.

// g++ -std=c++14 -lboost_thread -lboost_system -pthread -Ipath/to/crow hello.cpp 
#include "crow.h"
int main()
{
    crow::SimpleApp app;
    CROW_ROUTE(app, "/")
    ([]() {
        return "Hello world!";
    });
    app.port(18080).run();
}

# client.py
import requests, time
for i in range(1000):
  res = requests.get("http://127.0.0.1:18080/")
  print(i,res,res.text)
  time.sleep(1)

After running both programs in parallel it make sense to test the performance again. This time, the C++ crow service is much faster than the previous FLASK server. On the same standard notebook, 1000 requests are handled in 2.64 seconds, compared to 4.8 seconds in the previous trial. Or to give the other point of view: the combination of a C++ based RESTful server provides 379 requests per seconds. If the performance is greater, if the server and the client are programmed both in C++ is unclear.

What makes the situation comfortable is, that the same technique can be used together with different programming language. It's possible to combine Java, Python, C++, C, PHP, Javascript and C#. That means, it's possible to write the backend in Python and the frontend in Java, or the backend in C++ and the frontend in C#. What makes this technique so powerful is, that there is no need to convince a certain programmer to leave his normal programming language and switch to a different one which works better. The reality is, that real python programmer never would migrate to C++, and an expert Java coder is not interested in learning Python. All of the programmers are right, their home language is the best one and there is no need to switch to a different computer language.

The disadvantage is, that against the self-description, RESTful and the C++ crow implementation are not lightweight. The resulting a.out binary file contains a complete webserver and has a size of 1.4 megabyte.

The interesting question is how much performance is needed in reality? Suppose the idea is to write a computer game which contains of frontend and a backend.The interesting point is, that even the frontend is rendering 60 fps, there is no need to send with this speed data back and forth. In most cases it's enough to update the game state with 5-10 frames per seconds. This is way more faster, than a human player can react. This kind of latency is reached easily by all RESTful implementations.

What's wrong with creating a library?

An often told best practice method for creating more efficient software programs is to write a library over a normal application. Instead of typing in a new Python program from scratch the better alternative is to encapsulate the code so that it can be imported easily with an “import library”. On the first look this allows to reuse existing code, but there is a big disadvantage. Because somebody who likes to utilize the existing code has to use the same programming language which is python.

In most cases, this results into a discussion which programming language is the right one. Is Python well suited to write a library, especially if the code should run very fast? Many people would argue pro Python while other are against Python, and both a right. But if Python is a bad idea, what is the more appropriate language in creating libraries? On the first look, C++ or Java is a much better choice, but the problem remains the same, that after creating a library in C++, only other C++ programmer get access to the functionality.

This is not a typical C++ problem but is visible in all major languages. No matter if the code was written in C++, Java, C#, Python or Go, it will become difficult to use the code from a different programming language. And this issue can't be fixed with new wrappers like SWIG or new sort of programming languages. What all programming language have in common is, that they are not working very well with different languages.

A typical technique from the past to overcome the issue is called standardization. The idea is to make a library compatible to the ABI standard on a binary level, so that it can be included from within other languages. The problem is, that even within the C/C++ community this principle doesn't work. For example, in Python it's possible to include C libraries with the ctypes interface, but the approach fails if the aim is to include C++ libraries. Does that mean, that C is superior over C++, because it's the lowest common standard? No because, C doesn't support classes, and classes are needed to create more complex application. The problem is, that in general it's a bad idea to connect two different programming languages. It's very easy to include C code into a c application, and it's easy to include Java code in Java, but it's hard to bridge from Python to C#, from Java to C++ and from C to LISP.

In most cases, the communication between different programming languages works with a wrapper which is higher instance ontop of both applications. It transforms the problem to a higher hierarchy, in which both applications are speaking the same language. There are some techniques available for doing so: wrapper, pipes and RESTful.

RESTful is the most promising technique to connect different applications, because it's not located within an existing programming language like the SWIG wrapper, nor it's working with a certain operating system like UNIX pipes. REST is an internet based message protocol which can be handled by all programming languages. The idea is to convert an application into a webserver which sends and receives data from the environment. The concept is very new and advanced and it was invented at first for internet services. Programming languages like Java and PHP are very RESTful friendly, but the concept is known in Python, C++ and C# as well.

August 25, 2019

Comparing C++ with C# for library creation

The normal comparison between C# vs. C++ are asking which language is the better choice for creating a desktop application. In theory, both languages are a great choice, because they have built in lots of features, and it's pretty easy to write working code in the language. instead of asking if C# is superior to C++ we should at first redefine the question into the direction, which language is more suitable for creating a programming library. What programmers are really doing is not to write yet another texteditor, but if the aim is to maximize the productivity the idea is to create library, which can be used by more than a single application.

Unfortunately, none of the common high level languages like C#, C++, Java or Python can be recommended for creating a library. The problem with C# and the mono ecosystem is, that it doesn't run under Linux. Java has the problem, that it's a bit slow, C++ has the problem that the compilers are heavy weight and Python is the slowest language ever invented. The most interesting fact in modern computer programing is, that a single programming language is available which is accepted widely as library friendly. The C programming language is still the number one for mainting old library and creating new one.

Every language (C#, C++ or Python) provides interface to run existing C code. The reason is, that the underlying routines from the operating system are written mostly in C and there is no alternative available which can replace the standard. In contrast to high level programming language, no one will argue against the C language as a low level library language.

Using many programming languages at the same time is a mess

The reason why Python is so popular is because it's a prototyping language. The programmer is no longer forced to use a complicated syntax or adapt to the needs of the machine, but he can type in pseudo code and focus on the application itself. But Python has a major disadvantage which makes it unusable for productive code: it's slow and won't run on different platforms.

The more elegant languages over Python are C#, Java, Go or C++. They are compiled or semicompiled languages. Especially C++ is recognized as a very efficient programming language. The problem is how to connect existing Python Code with a C++ application? I have tried out some techniques like Boost.Python, embedded Python, ctypes and SWIG but none of them can be recommended. The problem is, that even the programmer is able to write an interface for using Python together with C++, the technique can't be used for other purposes. In most cases the problem is not only to communicate between Python and C++, but perhaps the user likes to send data from a Java application to a C# application.

A more recent approach is RESTful which is a network communicating standard. On the first look it sounds not very attractive to send json data over the localhost interface of the network card, because we want to communicate between two applications. On the other hand, the RESTful idea is a very general approach which fits for all programming languages.

In theory, it's possible to start the Python app and also the C++ app and both are communicating to each other over a socket. I didn't have tried it out in reality, but the promise is, that this allows the programmer to use more than a single programming language in a project.

Is there a need for doing so? Yes it is, because it make sense to use the strength of each programming language instead of arguing which one is the better choice. For example, Python is by far the most efficient way in creating a prototype. The sourcecode can be written in Python faster than in C++, Java or C#. On the other hand, the Python sourcecode has the tendency to can't be used again. It's an antipattern to write a library with Python, because the performance is not fast enough.

One possible answer to the problem is to invent yet another programming language which is able to combine the strength of C++, Python and Java. Such a language can be learned easily, runs very fast and allows to write a prototype with low amount of code. But the prediction is, that such a magical language isn't available in the near future. Instead the C++ language will always be a compiled language, while Python will be always a scripting language. And Java will be always a system independent language while Javascript will be useful for internet scripting. I don't think, that's possible to replace all the language by a single one. Even C++ is not powerful enough to become a legitimate replacement for Python, Javascript and C#.

No, the answer to the problem of communicating between existing apps is located above the ABI (binary API). It has nothing to do with executing compiled code and it has nothing to do with virtual machines. It has to do with sending data back and forth, similar to the UNIX pipes. Perhaps this comparison is a good introduction into RESTful. A unix pipe allows to connect two different program which were created in different languages. It's possible to pipe the output of a perl script into the input of a C binary file. The question is not, if perl or C is the better language, the problem is located on the transit between both programs.

August 21, 2019

C++ is a great language, but heavy complicated to program

An experiment with programming a mathematical subroutine has shown, that in theory C++ is a great programming language which can be used in exchange over the Python language. C++ provides classes, lists, high level functions and the resulting binary file is ultrafast. On my computer the C++ version of the program was around 40x faster.

The disadvantage is, that programming a simple routine was very complicated. At least at the beginning, lots of compilation errors are visible. If somebody comes with a background of Python to C++ he will need some time to become familiar with the language. After a while the situation is more relaxed, because the programmer understands, that the auto keyword can't be used in reality, that floating variables need a “.f” at the end, and to end each statement with a semicolon. Sure, it's possible to learn the new syntax, but in contrast to Python it's take much longer to write a new program.

Sometimes, it make sense to convert a given program from Python into C++. The speed advantage is great. But there are reasons to avoid C++ in most cases. Using C++ on a daily basis doesn't make much sense. It takes simply to much time to test out new programming routines, Python is the more elaborated way in programming code. On the other hand, Python can't be executed efficient on modern hardware. So there is a need to convert existing Python code into the C++ language. I would guess, that in a practical project it make sense to manage two respositories at the same time. Python for prototyping the application and C++ for creating super fast routines.

Writing C++ code is a bit different from writing Python code. C++ can be seen as a classical programming language. The programmer has to convert a problem into a machine readable format. He has to decide which datatypes are needed, for example string or float. He has also manage pointers and has to figure out why a compilation error is there. Because C++ is a mature language, all problems can be solved. That means, any problem can be handled within the C++ universe. There is no problem which can't be realized in the language.

On the other hand, coding a program in C++ can become a bit exhausting. The advantage of Python is, that the programmer doesn't have to specify the details. Python doesn't know a datatype and if the programmer sees a pointer adress in the console, he has made a serious error. Python has the self-understanding, that the programmer has to focus on the problem and the algorithm, but not on the machine level. Python can be compared with matlab. It's a very high language which hides all the implementation details.

Somebody would argue, that C++ isn't a low level machine. Similar to Python it provides an abstraction layer to the programmer. He can get the maximum control over the machine by using the assembly language. This allows to code programs which are more efficient over C/C++.

Car parking explained easily

The amount of existing videos and academic papers about automated car parking are endless. And many demonstrations have shown, that the technical side can be mastered with the right programming effort. What is missing right now, are not “self parking cars”, but a description how to reproduce the magic trick.

Let us take a look into some descriptions of the past. What all the algorithms have in common is, that they are working as waypoint followers. The software takes a list of waypoints and the algorithm is driving the car to each of them. Car parking is equal to waypoint following. Usually the waypoint are fixed. Some advanced algorithms are calcuting the waypoints at the fly, but the normal software doesn't provide this feature.

The interesting question is, where did the software get the list of points in the 2d space, for example p1=(100,100), p2=(150,100) and so on? The answer is, that the points are generated from the programmer. He has put the points as fixed constants into the program. It is important to describe the task of the algorithm itself, which is equal to execute predefined steps, and the task not provided by the algorithm. Or to explain it shorter: every car parking software has a demand for an input stream. Without a list of waypoints, the car parking system isn't able to do anything. That means, the software isn't able to park a car, but what the software can do is to navigate the car through a waypoint list.

This is interesting to know, because otherwise the car parking task would become very complicated. Suppose, we doesn't know, what the algorithm is trying to archive. The assumption is, that the AI is trying to solve the task itself, which is “car parking”. The problem is, that the amount of information and the possible amount of strategies is endless. A single lidar sensor produces a large amount of raw data, and it's possible to measure any distance around the car. But which of the features are important? Does the algorithm need to know the robots angle, and is it more important than the distance to the front? Trying to solve the parking problem inside this hypothetical box is way to complicated. The better idea is to provide a clear strategy, what the algorithm task is and what not. Surprisingly, it's main task is not to park the car. Because on a high level description, car parking has to do with all the literature about the subject which was written in English, and no algorithm in the world is able to parse natural language in that way. Only humans are able to talk about car parking itself which includes all the details.

What computers can do is a low level park assistant. The disadvantage is, that for the computer the task doesn't make any sense. He is trying only to bring a robot to a waypoint and that's all. That means, for the robot it's not really a car parking task, but it has to do with calculating the distance in millimeter and steer the wheel into the desired direction. Even if the machine is print to the console “parking complete”, he doesn't understand a single word. The string was predefined into the subroutine and the computer is not able to explain what he was doing.

The same situation is visible in the famous micromouse challenge. For the naive observer, the software which runs on the microcontroller is able to steer the robot through the maze. But that is only, what the humans are thinking what the robot is doing. In reality, the robot is running a program which includes waypoints, a pathplanning algorithm and an interrupt handler for the start-stop signal. If the programmer press the start button, only the humans are excited because the robot is driving very fast through the maze. For the microcontroller the task has nothing to do with micromouse. The software doesn't even know, what a collision is. All what the machine can do is executing some behaviors which are provided by the human.

The more advanced way to understand a given AI software is by analyzing the human machine interaction during the programming task. What exactly has the programmer done, before the micromouse competition? Has he used an EPROM flash device to put the C code to the robot? Did he utilize a simulator to test the pathplanner in advance? Has he run some testing routine to test the bot under different conditions? This kind of description explains much better what the competition is about. It's not enough to take a look at the microcontroller and the software which runs on the device, but the whole picture has to be analyzed to reproduce the setup.

Unfortunately, the look into the backstage of the competition is not allowed. Most programmers are prefering to not explain how they have programmed the device. This is some kind of social role play which is comparable to what magician have done in the 18 th century. The main idea was, that the enterainer is not allowed to reveal the tricks because then they can be reproduced. And indeed, this is the aim to reverse engineer existing robotics software.

Let us go back to the human machine interaction. Every robot was programmed by a human. He has implemented the algorithm, he has learned theoretical knowledge at the university and he has tested the software before the public run. If we want to understand what a robot is doing, we do not have to talk to the robot. The robot itself can't answer why he is working so great. Even if the software runs without any bug, it won't tell the audience, how the underlying algorithm is working in detail. Even if the sourcecode is available, the amount of information which is provided by the machine itself is limited. The more interesting approach is to analyze the steps before the program was created. It's important to backtrack the entire project until it's beginning. In most cases, this has nothing to do with Artificial Intelligence nor computer science but with the project itself, which includes the amount of invested time, the tools which were used, and the books which are referenced.

Let us go back to the initial example with the car parking robot. In most cases, these projects are evolved from simpler line following challenges. At first, the robot is programmed to follow a black line at the ground. The next step is to follow a list of waypoints which are marked at the ground, and then the waypoints are located in a parking spot and this drives the robot into the lot.

Practical example

In a parking situation the robot is in the left screen and should navigate into the base on the right. The task shouldn't be solved by a human operator but by the AI system autonomously. On the first look the task is very complicated to handle because the robot movements are working under the law of the Box2d physics engine and it's hard to plan a path from start to goal. But is the park situation a control problem in which input information have to be analyzed and converted into a steering command over a time? Sure, it's a rhetorical question because the average parking algorithm doesn't work in such a way.

Instead the more easier to implement technique contains of subbehaviors. In the first step, the robot is driving to a waypoint in front of the base. And in the second step he is driving backward into the base. Or to explain it more colloquial. The robot is cheating. He doesn't negotiate with the environment about the next steering angle but the robot has it's own plan. According to the plan, the robot has to reach two waypoints in a serial order. That means, the car parking problem can be reduced to a waypoint following task.

But why does the robot know, which waypoints he has to follow? The answer is surprising easy. It's not encoded in the program and it's not part of the AI. That means, the robot doesn't even have an AI subsystem. The only routine the robot has is a waypoint following routine. The AI functionality to reach the base is the result of the interaction between the human programmer who provides the two waypoints and the robot's software which executes the behaviors. That means, the car parking problem is only formalized for the human programmer but not for the robot's AI.

Let us describe the paradox situation in detail. Even if the robot is able to navigate to the base, he hasn't solved the car parking problem. The Car parking problem is part of the AI curriculum but there is no need to write an AI Software for the robot with the same purpose. It's interesting how this semantic gap is overcome in reality. One option is, that the programmer tells a story to the audience:

“My robot is able to solve the car parking problem. He has built in sensors and a brain which is able to move the robot into the lot”.

The funny thing is, that this description doesn't fit to the reality. It's the interpretation from the outside to describe what the robot is doing. Indeed, the car is navigating into the lot, but that doesn't mean, that's an AI System. It means only, that that programmer believes for himself, that his robot is intelligent.

In a real line following robot project, the Python sourcecode was given at the end. https://circuitdigest.com/microcontroller-projects/raspberry-pi-line-follower-robot Is the Python code equal to an AI which can think? No, it's not. The code is providing only the line following functionality. The interesting point is, that if the line on the ground is going in a certain direction, the result is, that the robot will move intelligently. That means, if the line follows the maze, then it seems, that the robot is able to solve the maze. But it's the same python sourcecode. And in the code the maze is unknown.

Using Forth for float calculations

Some newbies may assume, that Forth isn't able to handle float variables very well. The reason is, that the normal stack only support integer values which are not able to express a fraction of a number. Handling float variables is possible with the dedicated float stack. In Gforth an example program would look like:

3.e \ put 3.0 to the float stack
7.e \ put 7.0 to the float stack
f.s \ show content of float stack
f/  \ 3.0/7.0
f.  \ retrieves the top element of float stack
0.428571428571429  \ the result of the calculation

\ it's  possible to shorten up the calcuation drastically
3e 7e f/ f.
0.428571428571429

\ lets try something more complicated
3.14e 3.e f* fsqrt f.  \ sqrt(3.14*3)
3.06920185064456  \ the result is correct

The examples have shown, that Forth is ready for handling floating point operations.

Creating Artificial Intelligence with domain specific languages

Artificial Intelligence is sometimes described as a machine who is able to think by it's own. This understanding is not completely wrong, because the robot doesn't need a remote control but the device finds the path autonomously. But the description is not complete because it doesn't tell the audience how to build an AI from scratch. The more elaborated way in teaching Artificial Intelligence is to focus on human machine interaction. Before a computer can take decision he needs to be programmed first. The programmer defines before the run of the robot, what the machine will do. He writes down a program and the computer executes the plan.

The amount of techniques to program computers are endless. In modern times, different programming languages like Python, Java and C++ were invented. In most cases, the operating system of the computer is written in C. For describing robotics movements a different kind of notation is needed which is usually called a behavior oriented language. The interesting point is, that a behavior script is the opposite of “the robot acts autonomously”. Instead, the script tells for each situation what to do. That means, the robot's movements are not the result of Artificial Intelligence, but it was written down by the programmer in advance.

The trick to build advanced robotics systems is located within the domain specific language for describing the behavior. If the language is efficient, it's much easier to describe a complex behavior. The more realistic way in describing what AI is about

A robotic script has two stakeholders. The first one is the computer who has to execute the script. That means, the high level behaviors are converted into assembly language. The second stakeholder is the programmer who likes to describe a certain problem within the notation. The domain specific language is located in the middle. It translates the intention of the programmer into a stream of machine language instructions.

August 15, 2019

How to start a robotics project?

Normal software engineering projects are grouped around certain technologies. For example, somebody can use a Linux server together with the PHP language to build a website, or it's possible to create with the C++ language a new computer game. If the programming environment is fixed, it's possible to figure out the details. And the number of options how to create within C++ a game is limited.

The situation in case of robotics project is a bit more difficult. There is no framework available. Sure, some libraries for creating robotics and even some programming language are mentioned in the literature. Sometimes the ROS project is called a quasi standard, and embedded control is often handled with C. But these technologies are not used for creating the AI itself, but they make only if it's already known how to realize the robot.

The better idea to start a robotics project is based on the steps in human computer interaction. A new robotics project is usually started as a manual control system. That means, the human operator gets a joystick and moves the robotarm remote. That is the same what a crane operator is doing. The second step is about reducing the workload for the human operator. The goal is to increase the automation level. In case of a robot arm who grasps objects this is done by automating the step of grasping itself. That means, the human operator controls the arm, but the robot decides when the right moment is there to close the gripper.

In the literature the concept is called shared autonomy. It means, that that some tasks are done by the human and other by the Artificial Intelligence. The human operator controls the movement of the arm, and the vision system detects if an object is in the hand and activates the grasping action. The advantage is, that only subparts of the system gets automated. That means, only the software which executes the grasping action is working autonomously, while the position of the gripper isn't controlled by the software. The overall pipeline can be improved into a fully autonomous system. The next step would be, that the AI controls both: the grasping and the position of the robot hand.

Somebody may argue, that the difference between a teleoperated robot arm and a robot arm who can grasp by itself is small. And indeed, in both cases the human operator is in the loop. That means, he has to move the joystick for doing the task. The advantage is, that the human will recognize the reduced workload. If he doesn't need to press the “grasp” button it's a clear improvement.

Combining GOAP with a vision model

GOAP (Goal oriented action planning) is a well known technique from Game AI to build realistic AI characters. The idea is, that the agent is in a worldstate and has a behavior library in the background. A solver is testing out different behaviors to bring the agent to a goal. GOAP is equal to a automatic textadventure which takes an input worldstate and generates the next behaviors.

To use the concept for real robotics, a vision model is needed which provides the input worldstate. A vision model is in the easiest case a vision cone infront of the agent. This is sometimes describes as spatial grounding in the literature, because it connects pixelcoordinates like “object=(100,100)” to language, e.g. “object isat front”.

August 13, 2019

Event processing in scripting AI

A common pattern in modern Game AI development is to store events in variables. An example is to create a variable “robotatdoor=True”, or “distance=100”. The first example is a boolean event which is called a trigger, and in the second case an integer variable was introduced to store detailed information about a situation.

In a behavior based architecture, the AI script takes the world state as input and calculates the actions in response to the event. A typical script would look like:

if robotatdoor: opendoor()
if distance<50: stop()

Unfortunately, there is a problem with event processing which can be called a categorization problem. The good news is, that in contrast to the robotics domain, all the event are certain. In a computer game it is sure, that the robot is really at the door, and that the distance is precisely 100 pixels. The categorization problem has to do with the program flow over a time period. Let us go into the details. A computer game consists of framesteps. In frame 0 the variable “robotatdoor” is False, in frame 10 the variable is False as well and at timecode, 20 the trigger gets activated and switches the state to True. Over a longer timespan the event can become true or false which is equal with a dual categorization. In case of the distance variable, there are also two categories available. In the first case the distance is smaller than 50 and in the second one it's greater. The robot behavior stop() is activated or not. The usage of categories are equal to formalize a situation. The problem is converted into a machine readable description which includes a decision making process. The algorithm for controlling the robot works deterministic, which means that the robot knows how the world looks like and what to do in each situation. Let us observe what will happen if the situation is unclear. Suppose the distance variable has no value:

distance=None
if distance<50: stop()

The if statement can't be executed because the value of the variable isn't available. The program will stop with an error. This is equal to a programming error. To overcome the issue, the programmer has to make sure, that each variable has a value. The “try except” statement in Python is a great help for doing so.

try:
  #distance=100
  distance=None
  if distance==None: raise ValueError
except ValueError:
  print("error")

The try except statement allows to stay within the program even if the variable has an unknown value. It prevents, that the Python interpreter exists to the command line. The unclear situation can be catched and handled with a subroutine for error management. It's important to build in an exception routine into an event processing system.

Measuring the worldstate Suppose a game consists of 3 sensor information which have all the boolean type:

input1=true/false
input2=true/false
input3=true/false

The total amount of possibilities for the worldstate is 2^3=8. It is possible to react for each world state separate, for example:

if worldstate=(0,0,1) then action1
if worldstate=(1,0,1) then action2

What will happen if the input variables have a different type? An 16 bit integer value can store values from 0 to 65535. The statespace is 2^16. If three input variables are given:

input1=0..65535
input2=0..65535
input3=0..65535

... the needed amount of storage space in the RAM memory is 3x16bit=48 Bit which can hold 2^48 worldstates = 2.81*10^14. It's not possible to decide for each worldstate which actions is needed:

if worldstate=(0,0,65535) then action1
if worldstate=(65535,0,65535) then action2

Especially in the domain of Q-learning the problem of the input space will become a problem. Because the number of rows and columns explodes shortly. The answer to the problem is to store the q-table in a neural network. The neural network is able to transform the complex input space of 2.81*10¹14 worldstates into a smaller one.

From an abstract point of view, it's important how many bits are needed to store the worldstate. In the first case, the entire worldstate can be stored in only 3 bits. In the second example with the integer values the amount of 48 bits are needed.

August 11, 2019

Improved Heatmap in Python

In addition to a previous posting an improved version of the heatmap sourcecode is given. The sourcecode was formatted in the HTML mode with the "pre" tag.

import pygame

class Game:
  def __init__(self):
    self.pygamewindow = pygame.display.set_mode((700, 350), pygame.HWSURFACE | pygame.DOUBLEBUF)    
    self.fps=5 # 20 
    for i in range(1000000):
      self.pygamewindow.fill(pygame.Color(255,255,255))   
      self.paintmap()
      pygame.display.update()
      pygame.time.wait(int(1000/self.fps))
  def heatmapcolor(self,value):
    # value 0..1, returns colorcode (r,g,b) 
    # init gradient
    gradient=[]  # (value,r,g,b)
    gradient.append((0.0,  0,0,1)) # blue
    gradient.append((0.25, 0,1,1)) # cyan
    gradient.append((0.5,  0,1,0)) # green
    gradient.append((0.75, 1,1,0)) # yellow
    gradient.append((1.0,  1,0,0)) # red
    gradient.append((1.0,  1,0,0)) # red extra
    # search base color
    for baseid in range(len(gradient)):
      diff=value-gradient[baseid][0]
      if diff>=0 and diff<0.25: 
        break
    # relative color
    relvalue=(value-gradient[baseid][0])*1/0.25
    color=[] # (r,g,b)
    for i in range(1,4):
      temp=(gradient[baseid+1][i]-gradient[baseid][i])*relvalue # get difference
      temp=(temp+gradient[baseid][i])*255 # convert to 255 scale
      temp=int(round(temp)) # round
      color.append(temp)
    return color  
  def paintmap(self):
    width=pygame.display.get_surface().get_width()
    grid_width,grid_height=20,300
    maxstep=int(round(width/grid_width))
    for i in range(maxstep):
      value=i/maxstep # 0..1
      temp=self.heatmapcolor(value)
      col=pygame.Color(temp[0],temp[1],temp[2])
      x=0+i*grid_width
      pygame.draw.rect(self.pygamewindow, col, (x,0,grid_width,grid_height))

mygame=Game()

The advantage is, that the resolution can be adjusted easily to reduce the gridwidth.

The symbol grounding problem is overestimated

A normal expert system works great if the facts are defined precisely. An example for a fact is, that the robot is near to the box, another fact is, that the box has an angle of 0 degree. The expert system takes these facts as input and executes operators on the facts. Not all rules can be applied but only a subset. The concept is known in game AI as a GOAP planner, because the solver is able to bring the system into a goal state.

According to some computer scientists, something is missing in that loop. They ask who the expert system gets all his facts. In the literature this question is called the symbol grounding problem because it's about a connection between the environment and the facts in the expert system. But is this problem really so important? In most cases the transition from perception to the fact database is not very complicated. The sensor is able to measure an information and the data is converted into a fact. If the robot is near to the box or not can be determined by a single line of code. Calling this transition a bottleneck which prevents expert systems from become a useful tool is an exaggeration. The real problem is not to convert a variable back and forth the difficulty is, to inference from the given facts. Instead of focus on the environment-to-sensor workflow the more important part of the overall architecture is the expert system itself.

Are all employees internal customers?

Quote: “It is recognized in the marketing literature that all employees of an organisation are internal customers. [...] Internal customers generate goods and services for the end customer” [1] page 2

This description is remarkable advanced, because in the common understanding of leadership the employees tries to satisfy his boss. The customer sees his boss as a customer who gives him an order. But it seems, that the marketing literature and especially the newer one has a different understanding of how management is working. The idea is to flip the social roles. That means, the boss is trying to help the employees and the employees are helping the external customers.

This is called total customer orientation and it seems, that at least in the management literature it's the quasi standard of how to organize a modern business.

[1] Conduit, Jodie, and Felix T. Mavondo. "How critical is internal customer orientation to market orientation?." Journal of business research 51.1 (2001): 11-24.

August 10, 2019

Heatmap in Python

According to the website http://www.andrewnoske.com/wiki/Code_-_heatmaps_and_color_gradients a heatmap is created by a color gradient which goes from blue to cyan then to green, over yellow to red. For realizing a function which takes as input a value between 0 .. 1 and returns as output the colorcode the inbetween values of two categories needs to be calculated. The Python function for doing so is a bit longer and takes 25 lines of code. In the referenced URL only the C++ version was given, I have reprogrammed the code.

During the creation of the sourcecode, a slider widget from TKinter was a great help. This allows the user to set a value between 0 and 1 interactively and observe what the return value of the function is.

Update: Sometthing is wrong with the embedded sourcecode. It seems, that the if statement (font) was formatted by the Blog engine a bit randomly.

import pygame

class Game:
def __init__(self):
self.pygamewindow = pygame.display.set_mode((500, 350), pygame.HWSURFACE | pygame.DOUBLEBUF)
self.fps=20 # 20
for i in range(1000000):
self.pygamewindow.fill(pygame.Color(255,255,255))
self.paintheatmap()
pygame.display.update()
pygame.time.wait(int(1000/self.fps))
def heatmapcolor(self,value):
# # value 0..1, returns colorcode (r,g,b)
# init gradient
gradient=[] # (value,r,g,b)
gradient.append((0.0, 0,0,1)) # blue
gradient.append((0.25, 0,1,1)) # cyan
gradient.append((0.5, 0,1,0)) # green
gradient.append((0.75, 1,1,0)) # yellow
gradient.append((1.0, 1,0,0)) # red
gradient.append((1.0, 1,0,0)) # red extra
# search base color
for i in range(len(gradient)):
diff=value-gradient[i][0]
if diff>=0 and diff<0 .25:="" font="" nbsp="">
break
# relative color
relvalue=(value-gradient[i][0])*1/0.25
red=(gradient[i+1][1]-gradient[i][1])*relvalue
red=(red+gradient[i][1])*255
green=(gradient[i+1][2]-gradient[i][2])*relvalue
green=(green+gradient[i][2])*255
blue=(gradient[i+1][3]-gradient[i][3])*relvalue
blue=(blue+gradient[i][3])*255
# result
result=(int(round(red)),int(round(green)),int(round(blue)))
return result
def paintheatmap(self):
grid_width,grid_height=12,50
maxstep=40
for i in range(maxstep):
value=i/maxstep # 0..1
temp=self.heatmapcolor(value)
col=pygame.Color(temp[0],temp[1],temp[2])
x=0+i*grid_width
pygame.draw.rect(self.pygamewindow, col, (x,3,grid_width,grid_height))

mygame=Game()

August 07, 2019

Creating a Task and motion planner

A so called Task and motion planner is very complicated to realize. From the description itself, it's a mixture of a high level text adventure plus an underlying physics engine. The idea is, that a solver determines in the text adventure what the actions are to fulfill a goal, and then the motion planner converts the high level tasks into concrete motions which are executed by the robot. The problem is to implement such an architecture in sourcecode.

My project so far relies on the programming language python. The easier part was to create the simulation itself. Thanks to the libraries pygame, tkinter and box2d it was easy in doing so. The resulting robot can be controlled with the keyboard by a human operator. The more compliated parts are the text adventure and the motion planner. The first idea was, to utilize the STRIPS or the Prolog syntax which is equal to store facts and rules. In the literature the concept is explained in detail but in reality, the resulting text adventure was hard to maintain. The problem was, that the rules have access to all the facts and no modules are available.

The better idea is to realize the text adventure with object oriented programming techniques. Which means, that every item in the game like the robot, the box and the map get a separate class, and the methods in the class can only operate on the internal datastructures. This time, the sourcecode was easier to read, because it's compatible to normal programming paradigm. That means, if somebody creates a standalone text adventure he will use for sure an object oriented language, but not the STRIPS notation.

What is open right now, is to combine all the modules into a runable application. This makes it hard to predict if the idea make sense or not. Even the example problem was a minimal example, the amount of needed sourcecode is higher than usual. Especially the concept of running two simulations in parallel makes the code complicated. The problem is, that the normal physics engine represents the game but in the text adventure the same game is calculated but in a different way.

Is there a need to create the text adventure at all? The answer is yes, because without a text adventure the solver can't determine the next step. The precondition to search in a tree for a node is, that a forward model is available which can produce the game tree. Let us go a step back and describe what a GOAP solver is doing. The idea is to test out randomly some actions in the model. A random generator executes an action and then the result is stored in a graph. And exact here is the problem. The action can only be executed inside a text adventure.

What will happen, if no text adventure is available? Then the solver has to send random actions to the normal physics engine. The problem with Box2d, ODE and Bullet is, that there performance is low. They are providing the future state of a system but for doing so lots of cpu ressources are needed. It is not possible to plan longer sequences of around 1 minute with these engines. 1 minute is equal to 60 seconds = 1200 frames. If 100 actions are calculated, the amount of cpu compuation is enormous.

Perhaps the term “task and motion planning” provides the description itself. A task is a high level action for example “bring the box to the goal”, while a motion is a low level action e.g. “move 20 pixels forward”. The normal physics engine works on a motion level, it has to do with a near time horizon of 1-2 seconds and detail movements. In contrast, a task planner has to provide the long term strategy which includes the selection of waypoints and define subgoals. On a task level a pick&place operation can be described with natural language:

1. moveto object

2. grasp object

3. moveto goal

4. ungrasp object

This short plan isn't providing any details. It's not possible to execute the plan directly on a physics engine. A physics engine needs a concrete command for example “left(-20)”. And that is the reason why task and motion planning are handled as different layers. There is a need to plan the actions with different hierarchies.

Practical example

For controlling a puck collecting robot the first thing to do is to create the motion planner. It is working on a low level and affects the underlying physics engine. The motion planner contains of two subfunctions which are “reach angle” and “forward”. The first one controls the direction of the robot, while the second one effects the forward motion. The details of implementing the motion primitives is up to the programmer, in most cases a simple difference calculation is sufficient. After the sourcecode is written it's possible to send to the motion planner the following plan:

1. reachangle(45)

2. forward((100,200))

The interaction with the robot works with these motion primitives. They are providing an interface to control the robot movements. It's not possible to control complicated tasks with these primitives, but only short horizons issues. For longer plans a task planner is required. The task planner is equal to a text adventure and provides also some primitives. The task primitives are:

1. moveto(goal)

2. graspbox

3. ungraspbox

The task planner is not allowed to send commands directly to the robot but the taskplanner sends commands to the motion planner. That means a high level task like moveto() is decomposed into motion primitives like reachangle and forward.

Avoiding the task planner?

If motion primitives are able to control the robot and it's possible to write a longer program which contains a sequence of motion primitives, why is there a need for a high level task planner? Suppose the plan for the robot is to drive to the box, grasp the object, move to the goal and place the object at the position. All the motion primitives are executed in a linear fashion and now an interruption takes place. The robot looses the box during the transit. The motion planner itself doesn't recognize the problem, only the higher instance will detect the issue.

A motion sequence should be tolerate against interruption. And the task planner has to figure out the new motion sequence.

Semi autonomous control

Unfortunately, the amount of frameworks and algorithm to implement a task and motion planner is low. Creating such a software is mostly an art but not an engineering discipline. A good starting point is to set a focus on manual control. If the robot is controlled manual, it's for 100% sure that a task is fulfilled. A planner should be understand as optional. The idea is to start with a teleoperated robot and improve the system slowly into an autonomous system. From the programmer perspective the question is how to improve the control of the robot in a way, that the workload for the human gets lower.

A typical example for this transition is to replace a keyboard control with a mouse control. A normal robot arm for example in an excavator is controlled by different sliders. With slider1 the operator controls motor1, with slider2 motor2 and so on. The first step is to write a software which takes a mouse as input and calculates the servo signal as the result. In the literature the concept is colloquial described as inverse kinematics and it helps a lot to reduce the workload. An inverse kinematics doesn'T mean that the robot works autonomously, it means, that the human operator points with the mouse to a target and the robot arm reaches the point.

August 04, 2019

What is the technology behind expert systems?

The literature about expert systems and general problem solving is large and many ideas are mentioned, for example the Lisp programming language, cognitive architectures and rule based systems. The problem is to identify subjects which are important to realize Artificial Intelligence and some which are not but can be described as a fashion of the 1980s and subjective preferences of researchers.

The main problem with the early AI research in the 1960s was, that no modern desktop computers were available. If the in 1960 and early 1970s somebody was interested to run a software he wasn't able to do so. Operating systems like Windows 95 were not invented, and interpreted languages like Python wasn't there. Most of the early AI literature is a mixture of AI principle and computer science in general. A typical example is the LISP language which was used for anything and nothing. Lisp was an operating system, self-modifying code, a programming language and an interactive environment. The first thing to do is to sort the own tools. So we should ask direct: what is the basic principle of an expert system?

Basically spoken it's not an algorithm but a text adventure which is simulating something. The programmer has to construct a game engine, which is equal to a rule engine, and then he can send commands to the text adventure. Either manual or with an automatic solver. This brings the game into a goal state. The understanding of an expert system as a textadventure is the core idea of symbolic AI. It helps to simplify a problem into smaller tasks. The advantage is, that any textadventure can be solved by a solver.

The next question is how to convert a given domain (for example a robot arm) into a text adventure. The answer has to do with human machine interaction. A human operator is able to control the robot and while he is doing so, it is possible to observe his actions in a psychological experiment. These studies are going beyond computer science, because a psychological experiment has a lot to do with humans but only little with turing-machines. From a computer science perspective, such experiments are equal to generate a dataset. That is a database with the recorded game log of the experiment. And it's up to the AI engineer to convert the game log into an expert system which is equal to a text adventure.

Unfortunately, the rules of a human machine interaction task are hidden in the dataset. They are not available as machine readable instructions but are based on experience, natural language instructions and general problem solving capabilities. Creating an expert system doesn't mean to invent an AI algorithm, but the algorithm is available as default. The more important goal is to invent a space in which actions can be executed. From a technical point of view, the attempt in doing so is called “model induction”, sometimes it is called a forward model because it describes how the system is working. Using a solver to bring an existing forward model into a goal state is not very hard. The algorithm are known, and in most cases a simple graph search technique is fast enough. The more demanding task is, that for most problems the forward model is not known.

The process of converting a human demonstration into a text adventure is called grounding. It's the core problem in AI because it allows an improved human machine communication. A grounded problem can be understood by both sides: for the computer the text adventure contains of symbols which can be stored and manipulated in the memory and for the human the game represents the reality.

On the first look, the most important question for expert systems programmer is how the expert system is working internally. This question has a surprisingly simple answer. The internal working is not important. It is working with a graph search algorithm or a similar algorithm which brings the current state into the goal state. In a primitive expert system the search algorithm contains of only 20 lines of code, who is testing out the entire state space and programming such an algorithm is not very complicated. The more demanding task is the question how an expert system perceives the environment. That means, the human operator is doing with the robot a task and the expert system monitors the actions. How exactly identifies the expert system a subaction, and what is shown on the screen as the detected event? A well programmed expert system is at foremost an activity recognition engine. It translates human activities into machine readable description.

Frameworks are not available

Even if some techniques are known to construct expert systems for example the CLIPS shell, the LISP programming language, the PDDL domain definition standard and the means-end analysis for searching in the state space none of these techniques are needed in a robotics project. They can be called less important details but are not are here to stay. If Lisp, PDDL and all the other techniques are useless which kind of framework, programming language or algorithm can be utilized for developing an expert system? Unfortunately, there is no such think like a framework, but a software engineering workflow which consists of three simple steps:

1. create a simulation which is controlled by a human operator

2. create a plan recognition system

3. create a fully autonomous solver

The first step is easy to solve because it's equal to normal game programming. The idea is use an existing programming language like C#, and use an existing game engine like Unity3d to create a standard game which takes the input of a human operator. The steps 2 and 3. are more complicated to realize. In most cases they have to do with observing humans who are doing a task and try to formlize the steps in a text adventure. This text adventure is used to parse a demonstration but is the baseline for the automatic solver as well. Instead of recommending a concrete programming language or an algorithm the better idea is to understand the steps 2 and 3 as part of a software engineering process. They are handled with version control systems like git and get visualized with the UML notation.

Pros and cons of the Shakey the robot project

A while ago a paper was published, which introduces the Shakey the robot project again and explains the advantage and disadvantages of the STRIPS planning system. On page 1 it was also explained, that Rodney Brooks wrote an anti-Shakey paper in which he explains that formalized planning is a dead end. It's important to focus first on the idea of a logical model of the environment. The Shakey robot has a preprogrammed environment model in which his own position, the allowed actions and other objects are foramlized in the situation calculus. This model allows Shakey to plan from the current situation into any future goal state.

The disadvantage of the concept and the reason why Brooks wrote a STRIPS critique is, that such a logical model is hard to program and it doesn't fit to the environment. In reality, it's not possible to reuse an existing STRIPS model. The Shakey description can't be utilized in a different robot for example a modern Lego Mindstorms system. Instead the STRIPS model has to be programmed again, which takes a large amount of time.

To understand the problem we have to go a step backward. The normal interaction between a robot and a human operator is done via teleoperation. That human can move the robot by using a joystick. If the robot should drive autonomously, he needs a logical model. The basic question is how to come from a teleoperated robot into an autonomous robot system. The answer is called plan recognition and learning from demonstration. This is the step inbetween and means, that the human interaction with the robot is tracked and converted into a model.

Plan recognition is equal to model tracking. The idea is not that Shakey should plan the next actions, but the idea is analyze if the logical model of the environment is right. The idea of STRIPS and Shakey goes into the right direction, what is missing is the ability to analyze human interaction with a teleoperated robot.

In the literature the idea of plan recognition is a new development because it's hard to explain why this technique is needed. From a practical standpoint it's equal to control a robot with a joystick and the software is able to recognize the actions. That means, the human operator let Shakey collide with an obstacle, and on the screen it is shown “collision detected”. Because the human operators knows the information in advance it seems that for such a message there is no need. Without a working plan recognition it's not possible to verify or build a logical representation of the environment. That's the reason why most STRIPS based projects have failed.´

Shakey the robot and STRIPS is working great, if the logical representation is there. The planner can take the model and plan the next steps to reach a goal. It's not very complicated to write such a planner and he will run with maximum performance. The bottleneck is there if the logical model isn't correct or no such model is available. In such a case, the robot won't make any action.

Plan recognition is equal to human-machine communication.[2] The robot and the human operator are speaking the same language. The problem is not how the Shakey software works internally, the question is, if Shakey is able to understand the teleoperator.

The plan recognition problem is a relative new develoopment which was analyzed after Shakey was built:

quote ”Schmidt, Sridharan and Goodson [1978, 1976] are the first to identify plan recognition as a problem in its own right” [3]

In contrast to robot control, plan recognition doesn't result into a working system. Instead the idea is to annotate the movements of a teleoperated robot. Somebody may argue, that it has nothing to do with Artificial Intelligence because the robot is controlled by a human operator. Additionally, the detected events and activities are grounded in natural language and psychology which is outside of computer science.

Debugging

Plan recognition can be seen as a model debugger. It is only successful, if the plan library contains of predefined actions which are able to detect events in the environment.[4] This allows for the programmer to implement and test new plan libraries, similar to writing computer code. He types in an action and used the plan recognizer to verify if the action makes sense.

Plan corpus

To simplify the process of plan recognition it's useful to build a plan corpus. That is a large plan library which contains action primitives and events for detecting and annotating raw data. It's not possible to generate a plan corpus automatically, but it's a manual task similar to create an English dictionary. A plan library is usually created by asking human participant to do a task, for example to walk on a line. Then the motion capture suite is recording all the information and they are annotated manual. On top of the recorded trajectory a parser is programmed. The overall plan library project has to provided as Open Science project in the internet which allows other researchers to participate.

Examples for corpus from the past are: HASC corpus, USC-HAD, Hugadb[5], PRAXICON and other datasets for activity recognition. Most of these projects were realized in the last 10 years.

[1] Shanahan, Murray. "Reinventing shakey." Logic-based artificial intelligence. Springer, Boston, MA, 2000. 233-253.

[2] Pollack, Martha E. "The uses of plans." Artificial Intelligence 57.1 (1992): 43-68.

[3] Mao, Wenji, and Jonathan Gratch. Decision-theoretic approach to plan recognition. ICT Technical Report ICT-TR-01-2004, 2004.

[4] Goultiaeva, Alexandra, and Yves Lespérance. "Incremental plan recognition in an agent programming framework." Working Notes of the AAAI Workshop on Plan, Activity, and Intention Recognition (PAIR). 2007.

[5] Chereshnev, Roman, and Attila Kertész-Farkas. "Hugadb: Human gait database for activity recognition from wearable inertial sensor networks." International Conference on Analysis of Images, Social Networks and Texts. Springer, Cham, 2017.

What is symbolic AI?

In the history of AI the well known General problem solver was the first example for an abstract planning system. Later examples were STRIPS and a modern version is called GOAP (goal oriented action planning). What these systems have in common is that they are using rules to solve a problem. A rule is part of a simulation, very similar to what a player activates in a textadventure to reach a goal.

The difference between rules and normal functions in a c program is, that the order of rules isn't fixed. It's the task of the solver to determine the sequence by it's own. This provides a higher flexibility. The simulation can start at a initstate and can be transfered into a goal state. Very similar to what a classcal pathplanner is doing, except that the world isn't a 100x100 pixel map but the state space contains of abstract actions like “take key”, “walkto location” and “open door”.

To get an idea how to use symbolic AI in the reality, a new environment was published in the year 2018 by Microsoft Research called Textworld. https://www.microsoft.com/en-us/research/project/textworld/ It is a textadventure and a solver in the same program. The textadventure simulates a world in which the player can do tasks which have to be entered on the command line. The solver is able to determine the actions autonomously to reach any state in the simulation.

The reason why robotics can profit from this idea is because the state space of a textadventure is much smaller than the geometrical statespace. The amount of possible goals and actions in a symbolic simulation is not very great. A graph search solver will find the actions very fast. The interesting aspect is, that GOAP like solvers doesn't need a certain framework, nor a dedicated STRIPS like programming language. The more important aspect is, to convert a domain into a textadventure. This is the most harder part.

In the famous monkey banana problem the domain contains of only 3 actions and 2 objects. In a robotics domain, the amount of actions is higher. The AI engine for a computergame will need hundred of rules and a dozens of subgoals. This programming task can be handled with the normal software engineering paradigm. That means with the help of UML diagrams, git version control and bugtesting. Or let me explain it from the other perspective. The question is how does a textadventure look like which is about a self-driving car, a biped robot, a Lemmings playing AI or a Mario AI bot? If somebody has answered these question he gets a fully working AI which is planning very fast. Sometimes the concept is called “Task and motion planning”. Task planning is referencing to symbolic AI planning which is equal to GOAP, STRIPS and GPS. Motion planning is the lowlevel side which includes inverse kinematics and pathplanning in a map. The more important part is the task planner, or to be more specific, the textadventure in which possible tasks are formalized.

Text adventure solver

The fascinating information about text adventures is, that they can be solved relative easily automatically. In contrast to a normal computer the amount of possible actions is not billions but only a few thousands. The famous blocksworld example in which a robot arm has to pick and place boxes can be interpreted as a textadventure as well. The robot has certain commands like grasp, ungrasp, moveto and the goal is to find the correct action sequence. It's some kind of puzzle game which can be solved with a graph search algorithm in under a minute.

Unfortunately, most games are not available as a textadventure. For example, Mario AI, Lemmings, or Starcraft are working with graphics but not with text and the available actions are unknown. To overcome the bottleneck one idea is to monitor an existing textadventure if it's fit to a game. That means, the Mario AI original game plus the Mario AI textadventure are started at the same time and the question is, if both instances are in the same state. If not, the textadventure version has to be modified.

Simulation

The Situation calculus is the theoretical background behind the STRIPS planning language. It describes a world which contains of a world state which has actions to change the state. Running such simulation is called a qualitative simulation because it's not based on numerical values but on natural language. For example, if the player types in “open door”, the string of the door variable gets changed into “door is open”. It's important to understand, that a qualitative model isn't an algorithm but it's a text adventure. It forms an environment in which a human player can execute actions and observe the resulting game state. An algorithm is needed only on top of the simulation to bring the system into a goal state.

In the history of AI many techniques were developed to realize a qualitive simulation. One example are the planning languages which are ADL, STRIPS, Golog or PDDL. Another attempt are XML based models and the latest innovation are RDF-triple storage and General description languages. All of these techniques are trying to build a textadventure as easy as possible.