Classical cpu design is based on a synchronized pipeline, see figure on top. The modules of the CPU like fetch, decode and execute need all the same amount of time to process an instruction and this allows to parallelize the workflow. Let us monitor the pipeline for a single instruction.
The instruction enters the pipeline at the left and needs 4 seconds for the first stage, then it gets delivered to the decode stage where it needs another 4 seconds duration until its processed. The last stage on the right needs additional 4 seconds. in total the instruction will take 4+4+4=12 seconds.
The advantage of a the pipeline is, that multiple instructions are executed in parallel. If the software program consists of 10 instruction, the CPU won't take 10x12 seconds = 120 seconds, but the processing needs only 10x12/3=40 seconds.
The disadvantage of the shown pipeline has to do with the synchronization of the stages. Each stage needs to be timed so that it will take exactly 4 seconds. In contrast, a clockless cpu has an incoming queue for each stage. There is no centralized synchronized pipeline anymore, but the CPU modules are connected like agents in a multi agent system. Let us monitor the execution of a single instruction.
The instruction is delivered to the incoming queue of the first stage which is “fetch”. Its executed within 1 seconds and then its delivered to the incoming queue of the decode stage. Because this stage needs much longer to process a single instruction, its likely that the instruction has to wait until the incoming queue is empty. This is the FIFO principle. If the instruction was processed its delivered to the third stage which is the execute module.
Even if the clockless architecture is more complicated to realize it has a great advantage. It allows to locate a bottleneck. In the initial synchronized pipeline its impossible to find the slowest module in the pipeline because all the stages need the same amount of time. It doesn't make sense to optimize on the modules, because every stage is timed in a 4 seconds duration.
In contrast, in the clockless version it is possible to measure and improve the performance of each individual stage. At the slowest module, the instruction queue is filled all the time.

No comments:
Post a Comment