Advanced Topics in Garbage Collection
1 Parallel and Concurrent Garbage Collection
2 Partial Object Relocation
3 Conservative Collection for Unsafe Languages
4 Weak References
5 Exercises for Section 7.8
We close our investigation of garbage collection with brief treatments of four additional topics:
Garbage collection in parallel environments.
Partial relocations of objects.
Garbage collection for languages that are not type-safe.
The interaction between programmer-controlled and automatic garbage collection.
1. Parallel and Concurrent Garbage Collection
Garbage collection becomes even more challenging when applied to applications running in parallel on a multiprocessor machine. It is not uncommon for server applications to have thousands of threads running at the same time; each of these threads is a mutator. Typically, the heap will consist of gigabytes of memory.
Scalable garbage-collection algorithms must take advantage of the presence of multiple processors. We say a garbage collector is parallel if it uses multiple threads; it is concurrent if it runs simultaneously with the mutator.
We shall describe a parallel, and mostly concurrent, collector that uses a concurrent and parallel phase that does most of the tracing work, and then a stop-the-world phase that guarantees all the reachable objects are found and re-claims the storage. This algorithm introduces no new basic concepts in garbage collection per se; it shows how we can combine the ideas described so far to create a full solution to the parallel-and-concurrent collection problem. How-ever, there are some new implementation issues that arise due to the nature of parallel execution. We shall discuss how this algorithm coordinates multiple threads in a parallel computation using a rather common work-queue model.
To understand the design of the algorithm we must keep in mind the scale of the problem. Even the root set of a parallel application is much larger, consisting of every thread's stack, register set and globally accessible variables. The amount of heap storage can be very large, and so is the amount of reachable data. The rate at which mutations take place is also much greater.
To reduce the pause time, we can adapt the basic ideas developed for in-cremental analysis to overlap garbage collection with mutation. Recall that an incremental analysis, as discussed in Section 7.7, performs the following three steps:
Find the root set. This step is normally performed atomically, that is, with the mutator(s) stopped.
Interleave the tracing of the reachable objects with the execution of the mutator(s). In this period, every time a mutator writes a reference that points from a Scanned object to an Unreached object, we remember that reference. As discussed in Section 7.7.2, we have options regarding the granularity with which these references are remembered. In this section, we shall assume the card-based scheme, where we divide the heap into sections called "cards" and maintain a bit map indicating which cards are dirty (have had one or more references within them rewritten).
Stop the mutator(s) again to rescan all the cards that may hold references to unreached objects.
For a large multithreaded application, the set of objects reached by the root set can be very large. It is infeasible to take the time and space to visit all such objects while all mutations cease. Also, due to the large heap and the large number of mutation threads, many cards may need to be rescanned after all objects have been scanned once. It is thus advisable to scan some of these cards in parallel, while the mutators are allowed to continue to execute concurrently.
To implement the tracing of step (2) above, in parallel, we shall use multiple garbage-collecting threads concurrently with the mutator threads to trace most of the reachable objects. Then, to implement step (3), we stop the mutators and use parallel threads to ensure that all reachable objects are found.
The tracing of step (2) is carried out by having each mutator thread per-form part of the garbage collection, along with its own work. In addition, we use threads that are dedicated purely to collecting garbage. Once garbage col-lection has been initiated, whenever a mutator thread performs some memory-allocation operation, it also performs some tracing computation. The pure garbage-collecting threads are put to use only when a machine has idle cycles. As in incremental analysis, whenever a mutator writes a reference that points
from a Scanned object to an Unreached object, the card that holds this reference
is marked dirty and needs to be rescanned.
Here is an outline of the parallel, concurrent garbage-collection algorithm.
Scan the root set for each mutator thread, and put all objects directly reachable from that thread into the Unscanned state. The simplest incre-mental approach to this step is to wait until a mutator thread calls the memory manager, and have it scan its own root set if that has not already been done. If some mutator thread has not called a memory allocation function, but all the rest of tracing is done, then this thread must be interrupted to have its root set scanned.
Scan objects that are in the Unscanned state. To support parallel com-putation, we use a work queue of fixed-size work packets, each of which holds a number of Unscanned objects. Unscanned objects are placed in work packets as they are discovered. Threads looking for work will de-queue these work packets and trace the Unscanned objects therein. This strategy allows the work to be spread evenly among workers in the tracing process. If the system runs out of space, and we cannot find the space to create these work packets, we simply mark the cards holding the objects to force them to be scanned. The latter is always possible because the bit array holding the marks for the cards has already been allocated.
Scan the objects in dirty cards. When there are no more Unscanned ob-jects left in the work queue, and all threads' root sets have been scanned, the cards are rescanned for reachable objects. As long as the mutators continue to execute, dirty cards continue to be produced. Thus, we need to stop the tracing process using some criterion, such as allowing cards to be rescanned only once or a fixed number of times, or when the number of outstanding cards is reduced to some threshold. As a result, this paral-lel and concurrent step normally terminates before completing the trace, which is finished by the final step, below.
4. The final step guarantees that all reachable objects are marked as reached. With all the mutators stopped, the root sets for all the threads can now be found quickly using all the processors in the system. Because the reachability of most objects has been traced, only a small number of objects are expected to be placed in the Unscanned state. All the threads then participate in tracing the rest of the reachable objects and rescanning all the cards.
It is important that we control the rate at which tracing takes place. The tracing phase is like a race. The mutators create new objects and new references that must be scanned, and the tracing tries to scan all the reachable objects and rescan the dirty cards generated in the meanwhile. It is not desirable to start the tracing too much before a garbage collection is needed, because that will increase the amount of floating garbage. On the other hand, we cannot wait until the memory is exhausted before the tracing starts, because then mutators will not be able to make forward progress and the situation degenerates to that of a stop-the-world collector. Thus, the algorithm must choose the time to commence the collection and the rate of tracing appropriately. An estimate of the mutation rate from previous cycles of collection can be used to help in the decision. The tracing rate is dynamically adjusted to account for the work performed by the pure garbage-collecting threads.
2. Partial Object Relocation
As discussed starting in Section 7.6.4, copying or compacting collectors are ad-vantageous because they eliminate fragmentation. However, these collectors have nontrivial overheads. A compacting collector requires moving all objects and updating all the references at the end of garbage collection. A copying collector figures out where the reachable objects go as tracing proceeds; if trac-ing is performed incrementally, we need either to translate a mutator's every reference, or to move all the objects and update their references at the end. Both options are very expensive, especially for a large heap.
We can instead use a copying generational garbage collector. It is effective in collecting immature objects and reducing fragmentation, but can be expensive when collecting mature objects. We can use the train algorithm to limit the amount of mature data analyzed each time. However, the overhead of the train algorithm is sensitive to the size of the remembered set for each partition.
There is a hybrid collection scheme that uses concurrent tracing to reclaim all the unreachable objects and at the same time moves only a part of the objects. This method reduces fragmentation without incurring the full cost of relocation in each collection cycle.
1. Before tracing begins, choose a part of the heap that will be evacuated.
As the reachable objects are marked, also remember all the references pointing to objects in the designated area.
When tracing is complete, sweep the storage in parallel to reclaim the space occupied by unreachable objects.
Finally, evacuate the reachable objects occupying the designated area and fix up the references to the evacuated objects.
3. Conservative Collection for Unsafe Languages
As discussed in Section 7.5.1, it is impossible to build a garbage collector that is guaranteed to work for all C and C + + programs. Since we can always compute an address with arithmetic operations, no memory locations in C and C + + can ever be shown to be unreachable. However, many C or C + + programs never fabricate addresses in this way. It has been demonstrated that a conservative garbage collector — one that does not necessarily discard all garbage — can be built to work well in practice for this class of programs.
A conservative garbage collector assumes that we cannot fabricate an ad-dress, or derive the address of an allocated chunk of memory without an ad-dress pointing somewhere in the same chunk. We can find all the garbage in programs satisfying such an assumption by treating as a valid address any bit pattern found anywhere in reachable memory, as long as that bit pattern may be construed as a memory location. This scheme may classify some data erro-neously as addresses. It is correct, however, since it only causes the collector to be conservative and keep more data than necessary.
Object relocation, requiring all references to the old locations be updated to point to the new locations, is incompatible with conservative garbage collection. Since a conservative garbage collector does not know if a particular bit pattern refers to an actual address, it cannot change these patterns to point to new addresses.
Here is how a conservative garbage collector works. First, the memory manager is modified to keep a data map of all the allocated chunks of memory. This map allows us to find easily the starting and ending boundary of the chunk of memory that spans a certain address. The tracing starts by scanning the program's root set to find any bit pattern that looks like a memory location, without worrying about its type. By looking up these potential addresses in the data map, we can find the starting addresses of those chunks of memory that might be reached, and place them in the Unscanned state. We then scan all the unscanned chunks, find more (presumably) reachable chunks of memory, and place them on the work list until the work list becomes empty. After tracing is done, we sweep through the heap storage using the data map to locate and free all the unreachable chunks of memory.
4. Weak References
Sometimes, programmers use a language with garbage collection, but also wish to manage memory, or parts of memory, themselves. That is, a programmer may know that certain objects are never going to be accessed again, even though references to the objects remain. An example from compiling will suggest the problem.
E x a m p l e 7 . 1 7 : We have seen that the lexical analyzer often manages a sym-bol table by creating an object for each identifier it sees. These objects may appear as lexical values attached to leaves of the parse tree representing those identifiers, for instance. However, it is also useful to create a hash table, keyed by the identifier's string, to locate these objects. That table makes it easier for the lexical analyzer to find the object when it encounters a lexeme that is an identifier.
When the compiler passes the scope of an identifier J, its symbol-table object no longer has any references from the parse tree, or probably any other intermediate structure used by the compiler. However, a reference to the object is still sitting in the hash table. Since the hash table is part of the root set of the compiler, the object cannot be garbage collected. If another identifier with the same lexeme as I is encountered, then it will be discovered that i" is out of scope, and the reference to its object will be deleted. However, if no other identifier with this lexeme is encountered, then J's object may remain as uncollectable, yet useless, throughout compilation. •
If the problem suggested by Example 7.17 is important, then the compiler writer could arrange to delete from the hash table all references to objects as soon as their scope ends. However, a technique known as weak references allows the programmer to rely on automatic garbage collection, and yet not have the heap burdened with reachable, yet truly unused, objects. Such a system allows certain references to be declared "weak." An example would be all the references in the hash table we have been discussing. When the garbage collector scans an object, it does not follow weak references within that object, and does not make the objects they point to reachable. Of course, such an object may still be reachable if there is another reference to it that is not weak.
5. Exercises for Section 7.8
Exercise 7 . 8 . 1 : In Section 7.8.3 we suggested that it was possible to garbage collect for C programs that do not fabricate expressions that point to a place within a chunk unless there is an address that points somewhere within that same chunk. Thus, we rule out code like
p = 12345;
x = *p;
because, while p might point to some chunk accidentally, there could be no other pointer to that chunk. On the other hand, with the code above, it is more likely that p points nowhere, and executing that code will result in a segmentation fault. However, in C it is possible to write code such that a variable like p is guaranteed to point to some chunk, and yet there is no pointer to that chunk. Write such a program.
Summary of Chapter 7
• Run-Time Organization. To implement the abstractions embodied in the source language, a compiler creates and manages a run-time environment in concert with the operating system and the target machine. The run-time environment has static data areas for the object code and the static data objects created at compile time. It also has dynamic stack and heap areas for managing objects created and destroyed as the target program executes.
• Control Stack. Procedure calls and returns are usually managed by a run-time stack called the control stack. We can use a stack because procedure calls or activations nest in time; that is, if p calls q, then this activation of q is nested within this activation of p.
• Stack Allocation. Storage for local variables can allocated on a run-time stack for languages that allow or require local variables to become inacces-sible when their procedures end. For such languages, each live activation has an activation record (or frame) on the control stack, with the root of the activation tree at the bottom, and the entire sequence of activation records on the stack corresponding to the path in the activation tree to the activation where control currently resides. The latter activation has its record at the top of the stack.
• Access to Nonlocal Data on the Stack. For languages like C that do not allow nested procedure declarations, the location for a variable is either global or found in the activation record on top of the run-time stack. For languages with nested procedures, we can access nonlocal data on the stack through access links, which are pointers added to each activation record. The desired nonlocal data is found by following a chain of access links to the appropriate activation record. A display is an auxiliary array, used in conjunction with access links, that provides an efficient short-cut alternative to a chain of access links.
Heap Management. The heap is the portion of the store that is used for data that can live indefinitely, or until the program deletes it explicitly. The memory manager allocates and deallocates space within the heap. Garbage collection finds spaces within the heap that are no longer in use and can therefore be reallocated to house other data items. For languages that require it, the garbage collector is an important subsystem of the memory manager.
Exploiting Locality. By making good use of the memory hierarchy, mem-ory managers can influence the run time of a program. The time taken to access different parts of memory can vary from nanoseconds to millisec-onds. Fortunately, most programs spend most of their time executing a relatively small fraction of the code and touching only a small fraction of the data. A program has temporal locality if it is likely to access the same memory locations again soon; it has spatial locality if it is likely to access nearby memory locations soon.
4 Reducing Fragmentation. As the program allocates and deallocates mem-ory, the heap may get fragmented, or broken into large numbers of small noncontiguous free spaces or holes. The best fit strategy — allocate the smallest available hole that satisfies a request — has been found empir-ically to work well. While best fit tends to improve space utilization, it may not be best for spatial locality. Fragmentation can be reduced by combining or coalescing adjacent holes.
4 Manual Deallocation. Manual memory management has two common failings: not deleting data that can not be referenced is a memory-leak error, and referencing deleted data is a dangling-pointer-dereference error.
• Reachability. Garbage is data that cannot be referenced or reached. There are two basic ways of finding unreachable objects: either catch the tran-sition as a reachable object turns unreachable, or periodically locate all reachable objects and infer that all remaining objects are unreachable.
• Reference-Counting Collectors maintain a count of the references to an object; when the count transitions to zero, the object becomes unreachable. Such collectors introduce the overhead of maintaining references and can fail to find "cyclic" garbage, which consists of unreachable objects that reference each other, perhaps through a chain of references.
• Trace-Based Garbage Collectors iteratively examine or trace all references to find reachable objects, starting with the root set consisting of objects that can be accessed directly without having to dereference any pointers. Mark-and-Sweep Collectors visit and mark all reachable objects in a first tracing step and then sweep the heap to free up unreachable objects. Mark-and-Compact Collectors improve upon mark-and-sweep; they relocate reachable objects in the heap to eliminate memory fragmentation.
Copying Collectors break the dependency between tracing and finding free space. They partition the memory into two semispaces, A and B.
Allocation requests are satisfied from one semispace, say A, until it fills up, at which point the garbage collector takes over, copies the reachable objects to the other space, say B, and reverses the roles of the semispaces.
Incremental Collectors. Simple trace-based collectors stop the user pro-gram while garbage is collected. Incremental collectors interleave the actions of the garbage collector and the mutator or user program. The mutator can interfere with incremental reachability analysis, since it can change the references within previously scanned objects. Incremental col-lectors therefore play it safe by overestimating the set of reachable objects; any "floating garbage" can be picked up in the next round of collection.
• Partial Collectors also reduce pauses; they collect a subset of the garbage at a time. The best known of partial-collection algorithms, generational garbage collection, partitions objects according to how long they have been allocated and collects the newly created objects more often because they tend to have shorter lifetimes. An alternative algorithm, the train algorithm, uses fixed length partitions, called cars, that are collected into trains. Each collection step is applied to the first remaining car of the first remaining train. When a car is collected, reachable objects are moved out to other cars, so this car is left with garbage and can be removed from the train. These two algorithms can be used together to create a partial collector that applies the generational algorithm to younger objects and the train algorithm to more mature objects.