February 28, 2026

Scene annotation with tags

The pictures shows a very simple landscape with a house, a lake and a sun. The problem for a robot is, that the information shown in the picture can't be parsed. The reason is that the picture is technically a 800x600 bitmap file and doesn't provide semantic meaning.

What a robot needs to understand a picture is a sequence of tags. For the concrete picture the tags would be: [sun], [house], [tree], [lake]. In a more elaborated setup the tags would be enhanced with additional informaiton about the position and the color, e.g.
- sun: top right, yellow
- house: center, red
- tree: left, green
- lake: bottom left, blue

These information can be stored in a database which is a json file. Such information can be parsed by a robot. So the missing part is a scene to tag converter which is the core element for a symbol grounding system.

No comments:

Post a Comment