March 30, 2026

Grounded text to action for playing Maniac Mansion

 In addition to the previous attemps to play the point&click adventure Maniac mansion with a large language model here is a more compressed repreentation. Its a 3 column table with a timecode, a textual desription, and low level mouse actions.

The textual description is located on the information layer of the DIWK pyramid, while the mouse movements are on the bottom data layer.

Timecode    Textual Description    ScummVM Mouse Movements / Interaction
00:05    Select Character: Bernard    Move cursor to Bernard's portrait (bottom right); Left-Click.
00:08    Move to Front Gate    Move cursor to far right of driveway; Left-Click.
00:15    Walk to Front Door    Move cursor to porch steps; Left-Click.
00:20    Action: "Pull" Door Mat    Click "Pull" (verb pane); Click "Door Mat" (on porch floor).
00:24    Action: "Get" House Key    Click "Get" (verb pane); Click "Entrance Key" (revealed on floor).
00:28    Action: "Use" Key on Door    Click "Use"; Click Key (inventory); Click "Front Door".
00:32    Enter Mansion (Main Hall)    Move cursor to open doorway; Left-Click.
00:40    Action: "Get" Flashlight    Walk to the small table near the stairs; Click "Get"; Click "Flashlight".

The main task for the sofrware is translation. A high level textual description gets converted into low level action. E.g.:
textual description= Action: Use Key on Door
mouse movement=Click "Use"; Click Key (inventory); Click "Front Door".

In other words the DIKW pyramid is mostly an abstraction mechanism which consists of different details for the same task. The AI for playing Maniac Mansion hasn't decide anything, but the AI takes a textual description as input and generates low level mouse movements as output.

Here is the workflow how to play the game with an AI. The human user has to provide the textual description what to do in each scene. For example the human enters "walk to front door". This input command is converted by the computer into mouse actions on the screen and executed by the computer. So the Maniac Mansion game gets teleoperated with an advanced textual interface. This interface reduces the workload for the human operator. He is no longer forced to move the mouse directly on the verbs and the objects, but the human enters text into the command line.

Its a bit complicated to explain why such a DIWK workflow works in reality. From a technical perspective, natural language was utilized as an abstraction mechanis to reduce complexity. Instead of solving the original task of moving the mouse on the screen and click on items, the new task to provide a textual walk through which gets converted automatically into mouse movemens.

This abstraction mechanism works only because natural language, here English, is a powerful tool. It provides all the needed vocabulary including grammar to formulate complex tasks. There is no need to develop computer algorithm, neural networks or cognitive architectures, but natural language itself is the asset for enabling artificial intelligence. 

No comments:

Post a Comment