One important category of PE for embedded multiprocessor is the accelerator. An accelerator is attached to CPU buses to quickly execute certain key functions. Accelerators can provide large performance increases for applications with computational kernels that spend a great deal of time in a small section of code. Accelerators can also provide critical speedups for low-latency I/O functions.
The design of accelerated systems is one example of hardware/software co-design—the simultaneous design of hardware and software to meet system objectives. Thus far, we have taken the computing platform as a given; by adding accelerators, we can customize the embedded platform to better meet our application’s demands.
As illustrated in Figure 4.1, a CPU accelerator is attached to the CPU bus. The CPU is often called the host. The CPU talks to the accelerator through data and control registers in the accelerator. These registers allow the CPU to monitor the accelerator’s operation and to give the accelerator commands.
The CPU and accelerator may also communicate via shared memory. If the accelerator needs to operate on a large volume of data, it is usually more efficient to leave the data in memory and have the accelerator read and write memory directly rather than to have the CPU shuttle data from memory to accelerator registers and back.
An accelerator is not a co-processor. A co-processor is connected to the internals of the CPU and processes instructions as defined by opcodes.
An accelerator interacts with the CPU through the programming model interface; it does not execute instructions. Its interface is functionally equivalent to an I/O device, although it usually does not perform input or output.
Both CPUs and accelerators perform computations required by the specification; at some level we do not care whether the work is done on a programmable CPU or on a hardwired unit.
The first task in designing an accelerator is determining that our system actually needs one. We have to make sure that the function we want to accelerate will run more quickly on our accelerator than it will by executing as software on a CPU.
If our system CPU is a small microcontroller, the race may be easily won, but competing against a high-performance CPU is a challenge. We also have to make sure that the accelerated function will speed up the system. If some other operation is in fact the bottleneck, or if moving data into and out of the accelerator is too slow, then adding the accelerator may not be a net gain.