Until 1986, the expected answer to the question ‘which processor offers the most performance’ would be the MC68020, the MC68030 or even the 386! Without exception, CISC processors such as these, had established the highest perceived performances. There were more esoteric processors, like the transputer, which offered large MIPS figures from parallel arrays but these were often considered only suitable for niche markets and appli-cations. However, around this time, an interest in an alternative approach to microprocessor design started, which seemed to offer more processing power from a simpler design using less transis-tors. Performance increases of over five times the then current CISC machines were suggested. These machines, such as the Sun SPARC architecture and the MIPS R2000 processor, were the first of a modern generation of processors based on a reduced instruc-tion set, generically called reduced instruction set processors (RISC).
The 80/20 rule
Analysis of the instruction mix generated by CISC compil-ers is extremely revealing. Such studies for CISC mainframes and mini computers shows that about 80% of the instructions gener-ated and executed used only 20% of an instruction set. It was an obvious conclusion that if this 20% of instructions were speeded up, the performance benefits would be far greater. Further analy-sis shows that these instructions tend to perform the simpler operations and use only the simpler addressing modes. Essen-tially, all the effort invested in processor design to provide com-plex instructions and thereby reduce the compiler workload was being wasted. Instead of using them, their operation was synthe-sised from sequences of simpler instructions.
This has another implication. If only the simpler instruc-tions are required, the processor hardware required to implement them could be reduced in complexity. It therefore follows that it should be possible to design a more performant processor with fewer transistors and less cost. With a simpler instruction set, it should be possible for a processor to execute its instructions in a single clock cycle and synthesise complex operations from se-quences of instructions. If the number of instructions in a se-quence, and therefore the number of clocks to execute the resultant operation, was less than the cycle count of its CISC counterpart, higher performance could be achieved. With many CISC proces-sors taking 10 or more clocks per instruction on average, there was plenty of scope for improvement.
The initial RISC research
The computer giant IBM is usually acknowledged as the first company to define a RISC architecture in the 1970s. This research was further developed by the Universities of Berkeley and Stanford to give the basic architectural models. RISC can be described as a philosophy with three basic tenets:
1. All instructions will be executed in a single cycle
This is a necessary part of the performance equation. Its implementation calls for several features — the instruction op code must be of a fixed width which is equal to or smaller than the size of the external data bus, additional operands cannot be supported and the instruction decode must be simple and orthogonal to prevent delays. If the op code is larger than the data width or additional operands must be fetched, multiple memory cycles are needed, increasing the execution time.
2. Memory will only be accessed via load and store instruc-tions
This naturally follows from the above. If an instruction manipulates memory directly, multiple cycles must be performed to execute it. The instruction must be fetched and memory manipulated. With a RISC processor, the memory resident data is loaded into a register, the register manipulated and, finally, its contents written out to main memory. This sequence takes a minimum of three instruc-tions. With register-based manipulation, large numbers of general-purpose registers are needed to maintain perform-ance.
3. All execution units will be hardwired with no microcoding Microcoding requires multiple cycles to load sequencers etc and therefore cannot be easily used to implement single-cycle execution units.
Two generic RISC architectures form the basis of nearly all the current commercial processors. The main differences between them concern register sets and usage. They both have a Harvard external bus architecture consisting of separate buses for instruc-tions and data. This allows data accesses to be performed in parallel with instruction fetches and removes any instruction/ data conflict. If these two streams compete for a single bus, any data fetches stall the instruction flow and prevent the processor from achieving its single cycle objective. Executing an instruction on every clock requires an instruction on every clock.