April 15, 2020

Building a modern robot from scratch

The main reason why Artificial Intelligence has failed in the past in real robotics projects is because its focused on computer science but not on the underlying domain. The untold assumption is, that np hard problems have to be solved with a certain algorithm which should be implemented in a programming language. After executing the program it will solve a certain AI problem, for example to grasp an object with a dexterous hand.

Why this assumption won't result into a grasping robot is because nobody knows how the algorithm can solve a task. In contrast to sorting an array, so called AI tasks have nothing to do with computing itself, but they have to do with driving a car, the shape of objects and communicating in natural language.

The better idea for realizing AI systems is to start with a teleoperated robot which is extended later with a database of trajectories. In the first step the human operator controls the robot arm with a joystick. This allows him to grasp an object. In the step 2 the pipeline is extend with grounded natural language and a learning from demonstration motion database. Both modules are not located in classical computer science nor mathematics but they have to do with applications of Artificial Intelligence.

Perhaps it make sense to go into the details. Suppose the human operator is able to grasp an object with a joystick. In theory, he can do so many hundred times, but the goal is to transfer the task into software for higher productivity. One important step towards this direction is to repeat the same action and record the trajectory. The result is a motion capture database. If the scene in front of the robot fits to the recorded scene the recorded action is reproduced in the playback mode. An interpolation between different trajectories will increase the accuracy.

The next step towards advanced robotics is to tag the trajectory database with grounded language. That means, the database is annotated with labels like “open gripper”, “close gripper” and “push object”. This allows to search in the database easier. For example, if the next task is about “push object”, an SQL query to the motion database will return all the trajectories from this domain. Then the solver will select some of them and creates the interpolated trajectory which is executed on the robot.

The combination of teleoperated robotics, learning from demonstration, and natural language grounding is a powerful technique to realize robotics projects which can be used in reality. That means, the system is not only an academic project to teach the students who they should do something, but the robot can be used for solving practical tasks.

The reason why this approach is nearly unknown in mainstream robotics and AI has to do because its easy and very complex at the same time. The described method combines artifacts from different domains. It has to do with motion capture (which is used in movie production), with grounded language (which is used in natural language processing) and with spline interpolation which is located in regression analysis. Combining all these subjects into a single project is not common in normal computer science. What computer scientists in the past have done is to solve a single problem. For example, they want to search in a database for a value. This limited problem is analyzed in depth, and the algorithm is created in a high level programming language. Unfortunately, this problem solving strategy fails in AI domains.

A good starting point for all sort of AI applications are teleoperated robots. Teleoperation means, that the machine has human level capabilities as default. The idea is, that a human operator is in charge of the system all the time. He is not allowed to leave the joystick because then the robot will fail to solve a task. If this teleoperated paradigm is working, the next step is to think about how to reduce the workload of the operator. That means, that he can control the robot hand more easier and relax a bit.

Trajectory replay

The interesting effect of a trajectory replay is, that on the first trial it won't work. If the robot repeats the prerecorded trajectory in a new situation the robot isn't able to reach the goal. But this failure doesn't show, that the idea is wrong, but it shows, that trajectory replay isn't the answer to the problem, but its the problem itself. The question is how to program a trajectory replay system which can adapt to different situations? Learning from demonstration is some sort of challenge which has to be addressed with modern algorithms.