Advanced Topics in Garbage
Collection
1 Parallel and Concurrent Garbage
Collection
2 Partial Object Relocation
3 Conservative Collection for
Unsafe Languages
4 Weak References
5 Exercises for Section 7.8
We close our investigation of garbage collection with brief treatments
of four additional topics:
Garbage collection in parallel
environments.
Partial relocations of objects.
Garbage collection for languages
that are not type-safe.
The interaction between
programmer-controlled and automatic garbage collection.
1. Parallel and Concurrent
Garbage Collection
Garbage collection becomes even more challenging when applied to
applications running in parallel on a multiprocessor machine. It is not
uncommon for server applications to have thousands of threads running at the
same time; each of these threads is a mutator. Typically, the heap will consist
of gigabytes of memory.
Scalable garbage-collection algorithms must take advantage of the
presence of multiple processors. We say a garbage collector is parallel if it uses multiple threads; it
is concurrent if it runs
simultaneously with the mutator.
We shall describe a parallel, and mostly concurrent, collector that uses
a concurrent and parallel phase that does most of the tracing work, and then a
stop-the-world phase that guarantees all the reachable objects are found and
re-claims the storage. This algorithm introduces no new basic concepts in
garbage collection per se; it shows how we can combine the ideas described so
far to create a full solution to the parallel-and-concurrent collection
problem. How-ever, there are some new implementation issues that arise due to
the nature of parallel execution. We shall discuss how this algorithm
coordinates multiple threads in a parallel computation using a rather common
work-queue model.
To understand the design of the algorithm we must keep in mind the scale
of the problem. Even the root set of a parallel application is much larger,
consisting of every thread's stack, register set and globally accessible
variables. The amount of heap storage can be very large, and so is the amount
of reachable data. The rate at which mutations take place is also much greater.
To reduce the pause time, we can adapt the basic ideas developed for
in-cremental analysis to overlap garbage collection with mutation. Recall that
an incremental analysis, as discussed in Section 7.7, performs the following
three steps:
Find the root set. This step is
normally performed atomically, that is, with the mutator(s) stopped.
Interleave the tracing of the
reachable objects with the execution of the mutator(s). In this period, every
time a mutator writes a reference that points from a Scanned object to an Unreached
object, we remember that reference. As discussed in Section 7.7.2, we have
options regarding the granularity with which these references are remembered.
In this section, we shall assume the card-based scheme, where we divide the
heap into sections called "cards" and maintain a bit map indicating
which cards are dirty (have had one
or more references within them rewritten).
Stop the mutator(s) again to rescan all the cards that may hold
references to unreached objects.
For a large multithreaded application, the set of objects reached by the
root set can be very large. It is infeasible to take the time and space to
visit all such objects while all mutations cease. Also, due to the large heap
and the large number of mutation threads, many cards may need to be rescanned
after all objects have been scanned once. It is thus advisable to scan some of
these cards in parallel, while the mutators are allowed to continue to execute
concurrently.
To implement the tracing of step (2) above, in parallel, we shall use multiple
garbage-collecting threads concurrently with the mutator threads to trace most of the reachable objects. Then, to
implement step (3), we stop the mutators and use parallel threads to ensure
that all reachable objects are found.
The tracing of step (2) is carried out by having each mutator thread
per-form part of the garbage collection, along with its own work. In addition,
we use threads that are dedicated purely to collecting garbage. Once garbage
col-lection has been initiated, whenever a mutator thread performs some
memory-allocation operation, it also performs some tracing computation. The
pure garbage-collecting threads are put to use only when a machine has idle
cycles. As in incremental analysis, whenever a mutator writes a reference that
points
from a Scanned object to
an Unreached
object, the card that holds this reference
is marked dirty and needs to be rescanned.
Here is an outline of the parallel, concurrent garbage-collection
algorithm.
Scan the root set for each
mutator thread, and put all objects directly reachable from that thread into
the Unscanned state. The simplest
incre-mental approach to this step is to wait until a mutator thread calls the
memory manager, and have it scan its own root set if that has not already been
done. If some mutator thread has not called a memory allocation function, but
all the rest of tracing is done, then this thread must be interrupted to have
its root set scanned.
Scan objects that are in the Unscanned state. To support parallel
com-putation, we use a work queue of fixed-size work packets, each of which holds a number of Unscanned objects. Unscanned
objects are placed in work packets as they are discovered. Threads looking for
work will de-queue these work packets and trace the Unscanned objects therein. This strategy allows the work to be
spread evenly among workers in the tracing process. If the system runs out of
space, and we cannot find the space to create these work packets, we simply
mark the cards holding the objects to force them to be scanned. The latter is
always possible because the bit array holding the marks for the cards has
already been allocated.
Scan the objects in dirty cards.
When there are no more Unscanned
ob-jects left in the work queue, and all threads' root sets have been scanned,
the cards are rescanned for reachable objects. As long as the mutators continue
to execute, dirty cards continue to be produced. Thus, we need to stop the
tracing process using some criterion, such as allowing cards to be rescanned
only once or a fixed number of times, or when the number of outstanding cards
is reduced to some threshold. As a result, this paral-lel and concurrent step
normally terminates before completing the trace, which is finished by the final
step, below.
4. The final step guarantees that all reachable objects are marked as
reached. With all the mutators stopped, the root sets for all the threads can
now be found quickly using all the processors in the system. Because the
reachability of most objects has been traced, only a small number of objects
are expected to be placed in the Unscanned
state. All the threads then participate in tracing the rest of the reachable
objects and rescanning all the cards.
It is important that we control the rate at which tracing takes place.
The tracing phase is like a race. The mutators create new objects and new
references that must be scanned, and the tracing tries to scan all the
reachable objects and rescan the dirty cards generated in the meanwhile. It is
not desirable to start the tracing too much before a garbage collection is
needed, because that will increase the amount of floating garbage. On the other
hand, we cannot wait until the memory is exhausted before the tracing starts,
because then mutators will not be able to make forward progress and the
situation degenerates to that of a stop-the-world collector. Thus, the
algorithm must choose the time to commence the collection and the rate of
tracing appropriately. An estimate of
the mutation rate from previous cycles of collection can be used to help in the
decision. The tracing rate is dynamically adjusted to account for the work
performed by the pure garbage-collecting threads.
2. Partial Object Relocation
As discussed starting in Section 7.6.4, copying or compacting collectors
are ad-vantageous because they eliminate fragmentation. However, these
collectors have nontrivial overheads. A compacting collector requires moving
all objects and updating all the references at the end of garbage collection. A
copying collector figures out where the reachable objects go as tracing
proceeds; if trac-ing is performed incrementally, we need either to translate a
mutator's every reference, or to move all the objects and update their
references at the end. Both options are very expensive, especially for a large
heap.
We can instead use a copying generational garbage collector. It is
effective in collecting immature objects and reducing fragmentation, but can be
expensive when collecting mature objects. We can use the train algorithm to
limit the amount of mature data analyzed each time. However, the overhead of
the train algorithm is sensitive to the size of the remembered set for each
partition.
There is a hybrid collection scheme that uses concurrent tracing to
reclaim all the unreachable objects and at the same time moves only a part of
the objects. This method reduces fragmentation without incurring the full cost
of relocation in each collection cycle.
1. Before tracing begins, choose a part of the heap that will be
evacuated.
As the reachable objects are marked, also remember all the references
pointing to objects in the designated area.
When tracing is complete, sweep
the storage in parallel to reclaim the space occupied by unreachable objects.
Finally, evacuate the reachable
objects occupying the designated area and fix up the references to the
evacuated objects.
3. Conservative Collection for
Unsafe Languages
As discussed in Section 7.5.1, it is impossible to build a garbage
collector that is guaranteed to work for all C and C + + programs. Since we can
always compute an address with arithmetic operations, no memory locations in C
and C + + can ever be shown to be unreachable. However, many C or C + +
programs never fabricate addresses in this way. It has been demonstrated that a
conservative garbage collector — one that does not necessarily discard all
garbage — can be built to work well in practice for this class of programs.
A conservative garbage collector assumes that we cannot fabricate an
ad-dress, or derive the address of an allocated chunk of memory without an
ad-dress pointing somewhere in the same chunk. We can find all the garbage in
programs satisfying such an assumption by treating as a valid address any bit
pattern found anywhere in reachable memory, as long as that bit pattern may be
construed as a memory location. This scheme may classify some data erro-neously
as addresses. It is correct, however, since it only causes the collector to be
conservative and keep more data than necessary.
Object relocation, requiring all references to the old locations be
updated to point to the new locations, is incompatible with conservative
garbage collection. Since a conservative garbage collector does not know if a
particular bit pattern refers to an actual address, it cannot change these patterns
to point to new addresses.
Here is how a conservative garbage collector works. First, the memory
manager is modified to keep a data map
of all the allocated chunks of memory. This map allows us to find easily the
starting and ending boundary of the chunk of memory that spans a certain
address. The tracing starts by scanning the program's root set to find any bit
pattern that looks like a memory location, without worrying about its type. By
looking up these potential addresses in the data map, we can find the starting
addresses of those chunks of memory that might be reached, and place them in
the Unscanned state. We then scan all
the unscanned chunks, find more (presumably) reachable chunks of memory, and
place them on the work list until the work list becomes empty. After tracing is
done, we sweep through the heap storage using the data map to locate and free
all the unreachable chunks of memory.
4. Weak References
Sometimes, programmers use a language with garbage collection, but also
wish to manage memory, or parts of memory, themselves. That is, a programmer
may know that certain objects are never going to be accessed again, even though
references to the objects remain. An example from compiling will suggest the
problem.
E x a m p l e 7 . 1 7 : We have seen that the lexical analyzer often
manages a sym-bol table by creating an object for each identifier it sees.
These objects may appear as lexical values attached to leaves of the parse tree
representing those identifiers, for instance. However, it is also useful to
create a hash table, keyed by the identifier's string, to locate these objects.
That table makes it easier for the lexical analyzer to find the object when it
encounters a lexeme that is an identifier.
When the compiler passes the scope of an identifier J, its symbol-table
object no longer has any references from the parse tree, or probably any other
intermediate structure used by the compiler. However, a reference to the object
is still sitting in the hash table. Since the hash table is part of the root
set of the compiler, the object cannot be garbage collected. If another
identifier with the same lexeme as I
is encountered, then it will be discovered that i" is out of scope, and
the reference to its object will be deleted. However, if no other identifier
with this lexeme is encountered, then J's object may remain as uncollectable,
yet useless, throughout compilation. •
If the problem suggested by Example 7.17 is important, then the compiler
writer could arrange to delete from the hash table all references to objects as
soon as their scope ends. However, a technique known as weak references allows the programmer to rely on automatic garbage
collection, and yet not have the heap burdened with reachable, yet truly
unused, objects. Such a system allows certain references to be declared
"weak." An example would be all the references in the hash table we
have been discussing. When the garbage collector scans an object, it does not
follow weak references within that object, and does not make the objects they
point to reachable. Of course, such an object may still be reachable if there
is another reference to it that is not weak.
5. Exercises for Section 7.8
Exercise 7 . 8 . 1 : In Section
7.8.3 we suggested that it was possible to garbage collect for C programs that
do not fabricate expressions that point to a place within a chunk unless there
is an address that points somewhere within that same chunk. Thus, we rule out code like
p = 12345;
x = *p;
because, while p might point
to some chunk accidentally, there could be no other pointer to that chunk. On
the other hand, with the code above, it is more likely that p points nowhere, and executing that
code will result in a segmentation fault. However, in C it is possible to write
code such that a variable like p is
guaranteed to point to some chunk, and yet there is no pointer to that chunk.
Write such a program.
Summary of Chapter 7
• Run-Time Organization. To implement the abstractions embodied in the source language, a compiler creates and manages a run-time
environment in concert with the operating system and the target machine. The
run-time environment has static data areas for the object code and the static
data objects created at compile time. It also has dynamic stack and heap areas
for managing objects created and destroyed as the target program executes.
• Control Stack. Procedure calls and returns are usually managed by a run-time stack
called the control stack. We can use
a stack because procedure calls or activations
nest in time; that is, if p calls q, then this activation of q is nested within this activation of p.
• Stack Allocation. Storage for local variables can allocated on a run-time stack for languages that allow or
require local variables to become inacces-sible when their procedures end. For
such languages, each live activation has an activation
record (or frame) on the control
stack, with the root of the activation tree at the bottom, and the entire
sequence of activation records on the stack corresponding to the path in the
activation tree to the activation where control currently resides. The latter
activation has its record at the top of the stack.
• Access to Nonlocal Data on the
Stack. For languages like C that do not allow nested procedure declarations,
the location for a variable is either global or found in the activation record
on top of the run-time stack. For languages with nested procedures, we can
access nonlocal data on the stack through access
links, which are pointers added to each activation record. The desired
nonlocal data is found by following a chain of access links to the appropriate
activation record. A display is an
auxiliary array, used in conjunction with access links, that provides an
efficient short-cut alternative to a chain of access links.
Heap Management. The heap is the portion of the store that is used for data that can live indefinitely, or
until the program deletes it explicitly. The memory manager allocates and deallocates space within the heap. Garbage collection finds spaces within
the heap that are no longer in use and
can therefore be reallocated to house other data items. For languages that
require it, the garbage collector is an important subsystem of the memory
manager.
Exploiting Locality. By making good use of the memory hierarchy, mem-ory managers can
influence the run time of a program. The time taken to access different parts
of memory can vary from nanoseconds to millisec-onds. Fortunately, most
programs spend most of their time executing a relatively small fraction of the
code and touching only a small fraction of the data. A program has temporal locality if it is likely to
access the same memory locations again soon; it has spatial locality if it is likely to access nearby memory locations
soon.
4 Reducing Fragmentation. As
the program allocates and deallocates mem-ory, the heap may get fragmented, or broken into large numbers
of small noncontiguous free spaces or holes.
The best fit strategy — allocate the
smallest available hole that satisfies a request — has been found empir-ically
to work well. While best fit tends to improve space utilization, it may not be
best for spatial locality. Fragmentation can be reduced by combining or coalescing adjacent holes.
4 Manual Deallocation. Manual
memory management has two common failings: not deleting data that can not be
referenced is a memory-leak error,
and referencing deleted data is a dangling-pointer-dereference
error.
• Reachability. Garbage is data that cannot be referenced or
reached. There are two basic ways
of finding unreachable objects: either catch the tran-sition as a reachable
object turns unreachable, or periodically locate all reachable objects and
infer that all remaining objects are unreachable.
• Reference-Counting Collectors maintain a count of the references to an
object; when the count transitions to zero, the object becomes
unreachable. Such collectors introduce
the overhead of maintaining references and can
fail to find "cyclic"
garbage, which consists of unreachable objects that reference each
other, perhaps through a chain of references.
• Trace-Based Garbage Collectors
iteratively examine or trace all references
to find reachable objects, starting with the root set consisting of
objects that can be accessed directly without having to dereference any
pointers. Mark-and-Sweep Collectors
visit and mark all reachable objects in
a first tracing step and then sweep the heap to free up unreachable objects.
Mark-and-Compact Collectors improve upon
mark-and-sweep; they relocate
reachable objects in the heap to eliminate memory fragmentation.
Copying Collectors break the
dependency between tracing
and finding free space. They
partition the memory into
two semispaces, A and
B.
Allocation requests are satisfied from one semispace, say A, until it
fills up, at which point the garbage collector takes over, copies the reachable
objects to the other space, say B, and reverses the roles of the semispaces.
Incremental Collectors. Simple
trace-based collectors stop the user pro-gram while garbage is collected.
Incremental collectors interleave the actions of the garbage collector and the
mutator or user program. The mutator can interfere with incremental
reachability analysis, since it can change the references within previously scanned
objects. Incremental col-lectors therefore play it safe by overestimating the
set of reachable objects; any "floating garbage" can be picked up in
the next round of collection.
• Partial Collectors also reduce pauses; they collect a
subset of the garbage at a time. The
best known of partial-collection algorithms,
generational garbage collection, partitions objects according to how
long they have been allocated and collects the newly created objects more often
because they tend to have shorter lifetimes. An alternative algorithm, the
train algorithm, uses fixed length partitions, called cars, that are collected
into trains. Each collection step is applied to the first remaining car of the
first remaining train. When a car is collected, reachable objects are moved out
to other cars, so this car is left with garbage and can be removed from the
train. These two algorithms can be used together to create a partial collector
that applies the generational algorithm to younger objects and the train
algorithm to more mature objects.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.