March 28, 2026

Human to robot interaction with a dikw pyramid

Maniac Mansion is a well known point&click adventure. With the help of a walk through tutorial its possible to win the game. The standard tutorial consists of keypoints and full sentences written in English which can be read by humans but can't be executed by a computer. With a converter from high level to a low level layer its possible to transform the walk through tutorial into machine readable commands. The process is demonstrated with the following json code adressing the kitchen scene:

{
  "card_id": "MM_KITCHEN_01",
  "scene_title": "The Mansion Kitchen - ScummVM Navigation",
  "content": {
    "textual_description": {
      "objective": "Enter the kitchen to retrieve the Small Key from the counter while staying alert for Nurse Edna.",
      "key_points": [
        "The kitchen is located through the first door on the right in the main hallway.",
        "Crucial Item: The Small Key is sitting on the counter near the sink.",
        "Hazard: Opening the refrigerator triggers a cutscene/event that can lead to capture.",
        "Exit Strategy: Use the door to the far right to enter the Dining Room if the hallway is blocked."
      ]
    },
    "low_level_representation": {
      "engine_context": "ScummVM - 320x200 Resolution (Original Scale)",
      "mouse_interactions": [
        {
          "step": 1,
          "verb_action": "PICK UP",
          "verb_coordinates": { "x": 40, "y": 175 },
          "target_object": "Small Key",
          "target_coordinates": { "x": 165, "y": 115 },
          "result": "Key added to character inventory."
        },
        {
          "step": 2,
          "verb_action": "WALK TO",
          "verb_coordinates": { "x": 10, "y": 165 },
          "target_location": "Dining Room Door",
          "target_coordinates": { "x": 305, "y": 110 },
          "result": "Character transitions to the next room."
        }
      ],
      "safety_note": "Avoid clicking 'OPEN' (x: 10, y: 175) on the Refrigerator (x: 240, y: 90) unless you have a specific distraction planned."
    }
  }
}


Both layers (low level and high level) are describing the same scene which is to enter the kitchen and fetch the key. The difference is, that that the layers have a different abstraction level. The high level layer is prefered by humans and mirrors how humans are thinking and how they are using language. In contrast, the low level layer is prefered by machines who are programmed with a logic oriented mathematical notation.

The converter has the task to translate between these layer which is known as the symbol grounding problem. Solving the grounding problem means to improve human to machine interaction.

No comments:

Post a Comment