Asychronoos CPUs are the holy grail in chipdesign The famous GA144 is based on a clockless design, but it has never enter mainstream computing because the Forth language is different from existing ecosystem. On the other hand, its very difficult to invent a new processor design. Even the existing clock driven CPUs from Intel, Arm and AMD have a transistor count of billions and more.
To simplify the task drastically it makes sense to simulate a CPU with a python program in under 100 lines of code. Its a single python class which contains of a main loop, a fetch_method and a decode_method. A typical program which is executed by the simulated CPU is:
program = [
('LOAD', 'A', 5),
('LOAD', 'B', 3),
('ADD', 'A', 'A', 'B'),
('HALT',)
]
The task for the python program is to parse the commands like [load,add,halt]. To investigate a clockless CPU design a certain modification in the simulator is needed which is a different execution speed of each method. The fetch_method has a builtin time.sleep() command for 1 seconds, while the decode_method has a time.sleep() for waiting 4 seconds. This modification ensures, that the different modules of the CPU are running with a different speed which is the core element of a clockless architecture. Synchronizing the overall system can be realized with a shared bus, aka a message queue. The queue provides a put and a get command for writing and receiving data from the queue.
Let us observe an example. The fetch_method copies the next command e.g. “load A 5” from the memory into the buffer and needs 1 second for the task. Then the command gets executed by the decode_module which takes 4 seconds. in the meantime, the fetch_method runs in the idle mode.
On the first look it doesn't make sense to run each module with a different speed, because the fetch_method has to wait until the decode_module is ready. The advantage is, that the bottleneck is visible. In the diagram its shown clearly, that the slowest part of the system is the decode module. So the problem can be located.
If the CPU was realized as a single method, it remains unclear why it takes so long until the operation is done. It makes sense to group the CPU into modules according to their running speed. This allows to focus on the slowest modules and try to improve its operation.

No comments:
Post a Comment