Robotics and Artificial Intelligence: Teleoperation with joystick and natural language

March 02, 2026

Teleoperation with joystick and natural language

In the past, teleoperation was realized with a joystick. The human operator is navigating a robot by moving the joystick forward and backward. This allows a precise movement and the robot can do very complex tasks. The same principle is available for a construction crane and for joystick controlled UAV.

Even if joystick based teleoperation works great there is a bottleneck available because a human operator is needed all the time. A single human can control a single robot, controlling two UAV at the same time by a single operator is difficult or even impossible. From a technical perspective, a drone can receive signals with a higher frequency, the problem is that the human operator isn't able to generate the signals fast enough. To address this bottleneck a different sort of teleoperation is needed which is located on a higher level.

A slightly improvement over joystick based teleop is waypoint navigation. The human operator selects waypoints on a map and the robot is moving along the trajectory. This allows the human operator to reduce its workload. If the robot knows the next waypoint it is able to navgiate to the target by itself.

The next logical step after waypoint navigation is "grounded language control". The human operator communicates with the robot in natural language and gives a command like "move ahead, then rotate left, the move ahead for 10 meter, then stop". Such kind of language based communication reduces the workload for the human operator further. On the other hand, its a demanding task to program such an interface in a software.

Language based communication with robots is the answer to the teleoperation problem. It allows to control robots remotely with a reduced mental workload. Language has a higher abstraction level compared to a joystick control. This higher abstraction level must be translated for a robot into low level servo commands which known as "Symbol grounding". Let me explain it from a different perspective.

In classical joystick based teleoperation there is no grounding problem. The robot doesn't know terms like obstacle, shelf, move_ahead or stop. The robot understands only voltage signals transmitted from a remote control device. Such a robot can*t parse natural language but its a classical analog receiver. Of course, the human operator knows the words, he is aware that the robot enters a room and moves towards a shelf with a box. But this information is not relevant for the robot. its enough to move the joystick forward to navigate in a warehouse.

In contrast, a language based teleoperation requires that the robot understands natural language. The robot parses natural language commands and the robot gives feedback also in English.

The first electric RC toy cars were available since the 1960s. The build and operate such a car, a certain amount of knowledge in mechanics and electronics is needed. What isn't require is linguistic knowledge, because an RC car is not an English dictionary. It is a technical machine working with a battery and analog circuits. It took many decades until more advanced language controlled machines were available. One landmark project was the Ripley project in 2003 at the M.I.T, and also the voice controlled forklift at the same M.I.T. from 2010. Since the advent of vision language models in 2023, humanoid robots can be controlled with natural language.

Robotics and Artificial Intelligence

March 02, 2026

Teleoperation with joystick and natural language

No comments:

Post a Comment