The Stanford RISC model
This model uses a smaller number of registers (typically 32) and relies on software techniques to allocate register usage during procedural calls. Instruction execution order is optimised by its compilers to provide the most efficient way of performing the software task. This allows pipelined execution units to be used within the processor design which, in turn, allow more powerful instructions to be used.
However, RISC is not the magic panacea for all perform-ance problems within computer design. Its performance is ex-tremely dependent on very good compiler technology to provide the correct optimisations and keep track of all the registers. Many of the early M68000 family compilers could not track all the 16 data and address registers and therefore would only use two or three. Some compilers even reduced register usage to one register and effectively based everything on stacks and queues. Secondly, the greater number of instructions it needed increased code size dramatically at a time when memory was both expensive and low in density. Without the compiler technology and cheap memory, a RISC system was not very practical and the ideas were effec-tively put back on the shelf.
The MPC601 was the first PowerPC processor available. It has three execution units: a branch unit to resolve branch instruc-tions, an integer unit and a floating point unit.
The floating point unit supports IEEE format. The processor is superscalar. It can dispatch up to two instructions and process three every clock cycle. Running at 66 MHz, this gives a peak performance of 132 million instructions per second.
The branch unit supports both branch folding and specula-tive execution where the processor speculates which way the program flow will go when a branch instruction is encountered and starts executing down that route while the branch instruction is resolved.
The general-purpose register file consists of 32 separate registers, each 32 bits wide. The floating point register file also contains 32 registers, each 64 bits wide, to support double preci-sion floating point. The external physical memory map is a 32 bit address linear organisation and is 4 Gbytes in size.
The MPC601’s memory subsystem consists of a unified memory management unit and on-chip cache which communi-cates to external memory via a 32 bit address bus and a 64 bit data bus. At its peak, this bus can fetch two instructions per clock or 64 bits of data. It also supports split transactions, where the address bus can be used independently and simultaneously with the data bus to improve its utilisation. Bus snooping is also provided to ensure cache coherency with external memory.
The cache is 32 kbytes and supports both data and instruc-tion accesses. It is accessed in parallel with any memory manage-ment translation. To speed up the translation process, the memory management unit keeps translation information in one of three translation lookaside buffers.