Computer science in the past was operating with certain tools which were developed once and reused frequently. These tools were located in hardware, software and algorithms and are similar to cooking recipe a blueprint how to build technology from scratch. For example, a 32bit microchip is designed with a hardware description language like VHDL, while the Unix operating system is written in the programming language C. If someone has access to the VHDL file or the c source code he can create a copy of the technology because he has a better understanding how the system is working.
A naive assumption is to translate this model towards the Artificial Intelligence domain, which is a subpart of computer science. AI can be located in hardware, software or algorithms, at least under the assumption of the computer science bias. So its natural to ask for dedicated AI hardware and AI software, in the hope to get access to advanced robotics technology.
Unfortunately, AI works quite different from the computer science paradigm. Even if dedicated AI tools were developed, for example AI related FPGA chips and AI related software libraries, there is something missing to build a robot. This something else is unknown and because of inability to locate the magic ingredient, AI wasn't realized over decades. It was unknown how exactly an AI machine or tool has to look like so it was impossible to find such tools.
With more advanced knowledge about the AI subject its possible to describe with more details what the recipe is for creating robots. The missing ingredient is a multimodal dataset. That is a .csv file which contains of motion capture data in combination with natural language annotation like “jump”, “Walk”, “stand up”. Such a dataset can be used as the core element and a robot control system can be developed around this dataset.
This description for an AI tool looks a bit uncommon because it doesn't fit to existing categories in computer science. 'Its not a hardware description, it is not an algorithm and its not the source code for a computer program. But a multi modal dataset is simply a sensor recording, similar to a temperature log file from a weather station. The usefulness will become visible only on a second look. A multimodal dataset creates a machine learning problem. The question is which sort of neural network can learn the data and how to interpolate the missing data. The attempt to answer these questions results into a robotics project.
In contrast to a numerical dataset, a multimodal dataset consists of natural language annotation in the natural language of English. These annotation transform a robot control problem into a text adventure. The textual layer is the birds eye perspective towards a problem. Every robot control problem can be interpreted as a text adventure which consists of nouns, verbs and adjectives. Natural language is the best tool to describe the reality. Let me give an example:
A kitchen robot will perceive objects like table, plate, apple and bread. Also a kitchen robot can do actions like open, close, grasp and transfer. The natural language including its vocabulary is used to identify the parts of the reality. The robot cointrol system has to memorize the same words, otherwise the human to machine interaction will fail.
In classical computer science there is no need to utilize natural language. A pocket calculator and even advanced workstation computers are working fine without knowing any vocabulary. Words from the reality like “apple” “bread” and so forth are never included in the VHDL hardware description and they aren't stored in computer programs. Sometimes these words are available as commentary, but they are not important for running the software itself. In other words, classical computers are working fine without any interaction with a human.
In contrast, a robot control system operates with the opposite paradigm. A robot is mostly a user-interface which is using the same lexicon as a human. Its impossible to build a kitchen robot without 100 and more kitchen words.
No comments:
Post a Comment