Chapter: Embedded Systems Design : Memory systems

DRAM interfaces

The basic DRAM interface takes the processor generated address, places the high order bits onto the memory address bus to form the row address and asserts the RAS* signal.

DRAM interfaces

The basic DRAM interface

The basic DRAM interface takes the processor generated address, places the high order bits onto the memory address bus to form the row address and asserts the RAS* signal. This partial address is latched internally by the DRAM. The remaining low order bits, forming the column address, are then driven onto the bus and the CAS* signal asserted. After the access time has expired, the data appears on the Dout pin and is latched by the processor. The RAS* and CAS* signals are then negated. This cycle is repeated for every access. The majority of DRAM specifications define minimum pulse widths for the RAS* and CAS* and these often form the major part in defining the memory access time. To remain compatible with the PC–AT standard, memory refresh is performed every 15 microseconds.

This direct access method limits wait state-free operation to the lower processor speeds. DRAM with 100 ns access time would only allow a 12.5 MHz processor to run with zero wait states. To achieve 20 MHz operation needs 40 ns DRAM, which is unavail-able today, or fast static RAM which is at a price. Fortunately, the embedded system designer has more tricks up his sleeve to improve DRAM performance for systems, with or without cache.

Page mode operation

One way of reducing the effective access time is to remove the RAS* pulse width every time the DRAM was accessed. It needs to be pulsed on the first access, but subsequent accesses to the same page (i.e. with the same row address) would not require it and so are accessed faster. This is how the ‘page mode’ versions of most 256 kb, 1 Mb and 4 Mb memory work. In page mode, the row address is supplied as normal but the RAS* signal is left asserted. This selects an internal page of memory within the DRAM where any bit of data can be accessed by placing the column address and asserting CAS*. With 256 kb size memory, this gives a page of 1 kbyte (512 column bits per DRAM row with 16 DRAMs in the array). A 2 kbyte page is available from 1 Mb DRAM and a 4 kbyte page with 4 Mb DRAM.

This allows fast processors to work with slower memory and yet achieve almost wait state-free operation. The first access is slower and causes wait states but subsequent accesses within the selected page are quicker with no wait states.

However, there is one restriction. The maximum time that the RAS* signal can be asserted during page mode operation is often specified at about 10 microseconds. In non-PC designs, the refresh interval is frequently adjusted to match this time, so a refresh cycle will always occur and prevents a specification viola-tion. With the PC standard of 15 microseconds, this is not possible. Many chip sets neatly resolve the situation by using an internal counter which times out page mode access after 10 microseconds.

Page interleaving

Using a page mode design only provides greater perform-ance when the memory cycles exhibit some form of locality, i.e. stay within the page boundary. Every access outside the boundary causes a page miss and two or three wait states. The secret, as with caches, is to increase the hits and reduce the misses. Fortunately, most accesses are sequential or localised, as in program subrou-tines and some data structures. However, if a program is fre-quently accessing data, the memory activity often follows a code– data–code–data access pattern. If the code areas and data areas are in different pages, any benefit that page mode could offer is lost. Each access changes the page selection, incurring wait states. The solution is to increase the number of pages available. If the memory is divided into several banks, each bank can offer a selected page, increasing the number of pages and, ultimately, the number of hits and performance. Again, extensive hardware support is needed and is frequently provided by the PC chip set.

Page interleaving is usually implemented as a one, two or four way system, depending on how much memory is installed. With a four way system, there are four memory banks, each with their own RAS* and CAS* lines. With 4 Mbyte DRAM, this would offer 16 Mbytes of system RAM. The four way system allows four pages to be selected within page mode at any one time. Page 0 is in bank 1, page 1 in bank 2, and so on, with the sequence restarting after four banks.

With interleaving and Fast Page Mode devices, inexpensive 85 ns DRAM can be used with a 16 MHz processor to achieve a 0.4 wait state system. With no page mode interleaving, this system would insert two wait states on every access. With the promise of faster DRAM, future systems will be able to offer 33–50 MHz with very good performance — without the need for cache memory and its associated costs and complexity.

Burst mode operation

Some versions of the DRAM chip, such as page mode, static column or nibble mode devices, do not need to have the RAS/CAS cycle repeated and can provide data much faster if only the new column address is given. This has allowed the use of a burst fill memory interface, where the processor fetches more data than it needs and keeps the extra data in an internal cache ready for future use. The main advantage of this system is in reducing the need for fast static RAMs to realise the processor’s performance. With 60 ns page mode DRAM, a 4-1-1-1 (four clocks for the first access, single cycle for the remaining burst) memory system can easily be built. Each 128 bits of data fetched in such a way takes only seven clock cycles, compared with five in the fastest possible system. If burst-ing was not supported, the same access would take 16 clocks. This translates to a very effective price performance — a 4-1-1-1 DRAM system gives about 90% of the performance of a more expensive 2-1-1-1 static RAM design. This interface is used on the higher performance processors where it is used in conjunction with on-chip caches. The burst fill is used to load a complete line of data within the cache.

EDO memory

EDO stands for extended data out memory and is a form of fast page mode RAM that has a quicker cycling process and thus faster page mode access. This removes wait states and thus im-proves the overall performance of the system. The improvement is achieved by fine tuning the CAS* operation.

With fast page mode when the RAS* signal is still asserted, each time the CAS* signal goes high the data outputs stop assert-ing the data bus and go into a high impedance mode. This is used to simplify the design by using this transistion to meet the timing requirements. It is common with this type of design to perma-nently ground the output enable pin. The problem is that this requires the CAS* signal to be asserted until the data from the DRAM is latched by the processor or bus master. This means that the next access cannot be started until this has been completed, causing delays.

EDO memory does not cause the outputs to go to high impedance and it will continue to drive data even if the CAS* signal is removed. By doing this, the CAS* precharge can be started for the next access while the data from the previous access is being latched. This saves valuable nanoseconds and can mean the removal of a wait state. With very high performance proces-sors, this is a big advantage and EDO type DRAM is becoming the de facto standard for PCs and workstations or any other application that needs high performance memory.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Embedded Systems Design : Memory systems : DRAM interfaces |