Segmentation and paging
There are two methods of splitting the system memory into smaller blocks
for memory management. The size of these blocks is quite critical within the
design. Each block requires a translation descriptor and therefore the size of
the block is important. If the granularity is too small (i.e. the blocks are
1–2 kbytes), the number of descriptors needed for a 4 Gbyte system is extremely
large. If the blocks are too big, the number of descriptors reduces but
granular-ity increases. If a program just needs a few bytes, a complete block
will have to be allocated to it and this wastes the unused memory. Between
these two extremes lies the ideal trade-off.
A segmented memory management scheme has a small number of descriptors
but solves the granularity problem by allowing the segments to be of a variable
size in a block of contiguous memory. Each segment descriptor is fairly complex
and the hardware has to be able to cope with different address translation
widths. The memory usage is greatly improved, al-though the task of assigning
memory segments in the most effi-cient way is difficult.
This problem occurs when a system has been operating for some time and
the segment distribution is now right across the memory map. The free memory
has been fragmented into small chunks albeit in large numbers. In such cases,
the total system memory may be more than sufficient to allocate another
segment, but the memory is non-contiguous and therefore not available. There is
nothing more frustrating, when using such systems, as the combination of ‘2
Mbytes RAM free’ and ‘Insufficient memory to load’ messages when trying to
execute a simple utility. In such cases, the current tasks must be stopped,
saved and restarted to repack them and free up the memory. This problem can
also be found with file storage systems which need contiguous disk sectors and
tracks.
A paged memory system splits memory needs into multi-ple, same sized
blocks called pages. These are usually 1–2 kbytes in size, which allows them to
take easy advantage of fragmented memory. However, each page needs a
descriptor, which greatly increases the size of the look up tables. With a 4
Gbyte logical address space and 1 kbyte page size, the number of descriptors
needed is over 4 million. Each descriptor would be 32 bits (22 bits translation
address, 10 bits for protection and status) in size and the corresponding table
would occupy 16 Mbytes! This is a little impractical, to say the least. To
decrease the amount of storage needed for the page tables, multi-level tree
structures are used. Such mechanisms have been implemented in the MC68851 paged
memory management unit (PMMU), the MC68030 processor, PowerPC and ARM 920
processors, to name but a few.
Trees work by dividing the logical address into fields and using each of
the fields to successively reference into tables until the translation address
is located. This is then concatenated with the lower order page address bits to
complete a full physical address. The root pointer forms the start of the tree
and there may be separate pointers for user and supervisor use.
The root pointer points to separate pointer tables, which in turn point
to other tables and so on, until the descriptor is finally reached. Each
pointer table contains the address of the next location. Most systems differ in
the number of levels and the page sizes that can be implemented. Bits can often
be set to terminate the table walk when large memory areas do not need to be
uniquely defined at a lower or page level. The less levels, the more efficient
the table walking.
The next diagram shows the three-level tree used by the MC88200 CMMU.
The logical 32 bit address is extended by a user/ supervisor bit which is used
to select one of two possible segment table bases from the user and supervisor
area registers. The segment number is derived from bits 31 to 22 and is
concatenated with a 20 bit value from the segment table base and a binary ‘00’
to create a 32 bit address for the page number table. The page table base is
derived similarly until the complete translation is obtained. The remaining 12
bits of the descriptors are used to define the page, segment or area status in
terms of access and more recently, cache coherency mechanisms. If the attempted
access does not match with these bits (e.g. write to a write protected page),
an error will be sent back to the processor.
The next two diagrams show a practical implementation using a two-level
tree from an MC68851 PMMU. This is probably the most sophisticated of any of
the MMU schemes to have appeared in the last 20 years. It could be argued that
it was over-engineered as subsequent MMUs have effectively all been subsets of
its features. One being the number of table levels it supported. However as an
example, it is a good one to use as its basic principles are used even today.
The table walking mechanism compresses the amount of information needed
to store the translation information. Consider a task that occupies a 1 Mbyte
logical address map consisting of a code, data and stack memory blocks. Not all
the memory is used or allocated and so the page tables need to identify which
of the 2 kbyte pages are not valid and the address translation values for the
others. With a single-level table, this would need 512 descriptors, assuming a
2 kbyte page size. Each descriptor would need to be available in practice. Each
2 kbyte block has its own descriptor in the table and this explains why a
single level table will have 512 entries. Even if a page is not used, it has to
have an entry to indicate this. The problem with a single level table is the
number of descriptors that are needed. They can be reduced by increasing the
page size — a 4 kbyte page will have the number of required descriptors — but
this makes memory allocation extravagant. If a data structure needs 5 bytes
then a whole page has to be allocated to it. If it is just a few bytes bigger
than a page, a second page is needed. If a system has a lot of small data
structures and limited memory then small pages are probably the best choice to
get the best memory use. If the structures are large or memory in not a problem
then larger page sizes are more efficient from the memory management side of
the design.
With the two-level scheme shown, only the pages needed are mapped. All
other accesses cause an exception to allocate extra pages as necessary. This
type of scheme is often called demand paged memory management. It has the
advantage of immediately removing the need for descriptors for invalid or
unused pages which reduces the amount of data needed to store the descriptors.
This means that with the two-level table mechanism shown in the diagram,
only 44 entries are needed, occupying only 208 bytes, with additional pages
increasing this by 4 bytes. The exam-ple shows a relatively simple two-level
table but up to five levels are often used. This leads to a fairly complex and
time consuming table walk to perform the address translation.
To improve performance, a cache or buffer is used to contain the most
recently used translations, so that table walks only occur if the entry is not
in the address translation cache (ATC) or translation look-aside buffer (TLB).
This makes a lot of sense — due to the location of code and data, there will be
frequent accesses to the same page and by caching the descriptor, the penalty
of a table walk is only paid on the first access. However, there are still some
further trade-offs to consider.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.