The effect of memory wait states
This is probably the most important factor to consider as it can have the biggest impact on the performance of any system. With most high performance CPUs such as RISC and DSP proces-sors offering single cycle performance where one or more instruc-tions are executed on every clock edge, it is important to remember the conditions under which this is permitted:
• Instruction access is wait state free
To achieve this, the instructions are fetched either from internal on-chip memory (usually wait state free but not always), or from internal caches. The problem with caches is that they only work with loops so that the first time through the loop, there is a performance impact while the instructions are fetched from external memory. Once the instructions have been fetched, they are then available for any further execution and it is here that the performance starts to improve.
• Data access is wait state free
If an instruction manipulates data, then the time taken to execute the instruction must include the time needed to store the results of any data manipulation or access data from memory or a peripheral I/O port. Again, if the proc-essor has to wait — known as stalling — while data is stored or read, then performance is lost. If an instruction modifies some data and it takes five clocks to store the result, this potentially can cause processing power to be lost. In many cases, processor architectures make the assumption that there is wait state free data access by either using local memory, registers or cache memory to hold the informa-tion.
• There are no data dependencies outstanding
This leads on from the previous discussion and concerns the ability of an instruction to immediately use the result from a previous instruction. In many cases, this is only permitted if there is a delay to allow the processor to synchronise itself. As a result, the single cycle delay has the same result as a single cycle wait state and thus the perform-ance is degraded.
As a result of all these conditions, it should not be assumed that a 80 MHz single cycle instruction processor such as a DSP- or a RISC- based machine can provide 80 MIPs of processing power. It can provided the conditions are correct and there are no wait states, data dependencies and so on. If there are, then the perform-ance must be degraded. This problem is not unrecognised and many DSP and processor architectures utilise a lot of silicon in providing clever mechanisms to reduce the performance impact. However, the next question that must be answered is how do you determine the performance degradation and how can you design the code to use these features to minimise any delay?