December 20, 2025

Symbol grounding with tags

 

The screenshot shows a prototype for a maze game. The dominant features is a camera which determines semantic tags for the mouse position. Such a camera is a basic demonstration for grounded language, it doesn't photograph the picture on a pixel level but it captures the meaning of a cell. This meaning is encoded by tags like [wall], [junction], [robot] and [straight_right].

Without the tagging mechanism its impossible to communicate with the robot in natural language and its also impossible to execute tasks like instruction following or visual question answering. The detected tags are the precondition for human to robot interaction with natural language.

From a software engineering perspective, the semantic camera is a GUI widget element. It contains of a rectangle shown in the video game and there is a textual output shown on the bottom of the window.

No comments:

Post a Comment