|
|
@@ -3,75 +3,95 @@ precise, generational GC for better performance and smaller
|
|
|
memory usage (no false-positives memory retentions with big
|
|
|
allocations).
|
|
|
|
|
|
-This is a large task, but it can be done in steps.
|
|
|
-
|
|
|
-1) use the GCJ support to mark reference fields in objects, so
|
|
|
-scanning the heap is faster. This is mostly done already, needs
|
|
|
-checking that it is always used correctly (big objects, arrays).
|
|
|
-There are also some data structures we use in the runtime that are
|
|
|
-currently untyped that are allocated in the Gc heap and used to
|
|
|
-keep references to GC objects. We need to make them typed as to
|
|
|
-precisely track GC references or make them non-GC memory,
|
|
|
-by using more the GC hnadle support code (MonoGHashTable, MonoDomain,
|
|
|
-etc).
|
|
|
-
|
|
|
-2) don't include in the static roots the .bss and .data segments
|
|
|
-to save in scanning time and limit false-positives. This is mostly
|
|
|
-done already.
|
|
|
-
|
|
|
-3) keep track precisely of stack locations and registers in native
|
|
|
-code generation. This basically requires the regalloc rewrite code
|
|
|
-first, if we don't want to duplicate much of it. This is the hardest
|
|
|
-task of all, since proving it's correctness is very hard. Some tricks,
|
|
|
-like having a build that injects GC.Collect() after every few simple
|
|
|
-operations may help. We also need to decide if we want to handle safe
|
|
|
-points at calls and back jumps only or at every instruction. The latter
|
|
|
-case is harder to implement and requires we keep around much more data
|
|
|
-(it potentially makes for faster stop-the-world phases).
|
|
|
-The first case requires us to be able to advance a thread until it
|
|
|
-reaches the next safe point: this can be done with the same techniques
|
|
|
-used by a debugger. We already need something like this to handle
|
|
|
-safely aborts happening in the middle of a prolog in managed code,
|
|
|
-for example, so this could be an additional sub-task that can be done
|
|
|
-separately from the GC work.
|
|
|
-Note that we can adapt the libgc code to use the info we collect
|
|
|
-when scanning the stack in managed methods and still use the conservative
|
|
|
-approach for the unmanaged stack, until we have our own collector,
|
|
|
-which requires we define a proper icall interface to switch from managed
|
|
|
-to unmanaged code (hwo to we handle object references in the icall
|
|
|
-implementations, for example).
|
|
|
-
|
|
|
-4) we could make use of the generational capabilities of the
|
|
|
-Boehm GC, but not with the current method involving signals which
|
|
|
-may create incompatibilities and is not supported on all platforms.
|
|
|
-We need to start using write barriers: they will be required anyway
|
|
|
-for the generational GC we'll use. When a field holding a reference
|
|
|
-is changed in an object (or an item in an array), we mark the card
|
|
|
-or page where the field is stored as dirty. Later, when a collection
|
|
|
-is run, only objects in pages marked as dirty are scanned for
|
|
|
-references instead of the whole heap. This could take a few days to
|
|
|
-implement and probably much more time to debug if all the cases were
|
|
|
-not catched:-)
|
|
|
-
|
|
|
-5) actually write the new generational and precise collector. There are
|
|
|
-several examples out there as open source projects, though the CLR
|
|
|
-needs some specific semantics so the code needs to be written from
|
|
|
-scratch anyway. Compared to item 3 this is relatively easer and it can
|
|
|
-be tested outside of mono, too, until mono is ready to use it.
|
|
|
-The important features needed:
|
|
|
-*) precise, so there is no false positive memory retention
|
|
|
-*) generational to reduce collection times
|
|
|
-*) pointer-hopping allocation to reduce alloc time
|
|
|
-*) possibly per-thread lock-free allocation
|
|
|
-*) handle weakrefs and finalizers with the CLR semantics
|
|
|
-
|
|
|
-Note: some GC engines use a single mmap area, because it makes
|
|
|
-handling generations and the implementation much easier, but this also
|
|
|
-limits the expension of the heap, so people may need to use a command-line
|
|
|
-option to set the max heap size etc. It would be better to have a design
|
|
|
-that allows mmapping a few megabytes chunks at a time.
|
|
|
-
|
|
|
-The different tasks can be done in parallel. 1, 2 and 4 can be done in time
|
|
|
-for the mono 1.2 release. Parts of 3 and 5 could be done as well.
|
|
|
-The complete switch is supposed to happen with the mono 2.0 release.
|
|
|
+The first working implementation is committed in metadata/sgen-gc.c
|
|
|
+as of May, 2006. This is a two-generations moving collector and it is
|
|
|
+currently used to shake out all the issues in the runtime that need to
|
|
|
+be fixed in order to support precise generational and moving collectors.
|
|
|
+
|
|
|
+The two main issues are:
|
|
|
+1) identify as precisely as possible all the pointers to managed objects
|
|
|
+2) insert write barriers to be able to account for pointers in the old
|
|
|
+generation pointing to objects in the newer generations
|
|
|
+
|
|
|
+Point 1 is mostly complete. The runtime can register additional roots
|
|
|
+with the GC as needed and it provides to the GC precise info on the
|
|
|
+objects layout. In particular with the new precise GC it is not possible to
|
|
|
+store GC-allocated memory in IntPtr or UIntPtr fields (in fact, the new GC
|
|
|
+can allocate only objects and not GC-tracked untyped blobs of memory
|
|
|
+as the Boehm GC can do). Precise info is tracked also for static fields.
|
|
|
+What is currently missing here is:
|
|
|
+*) precise info for ThreadStatic and ContextStatic storage (this also requires
|
|
|
+better memory management for these sub-heaps)
|
|
|
+*) precise info for HANDLE_NORMAL gc handles
|
|
|
+*) precise info for thread stacks (this requires storing the info about
|
|
|
+managed stack frames along with the jitted code for a method and doing the
|
|
|
+stack walk for the active threads, considering conservatively the unmanaged
|
|
|
+stack frames and precisely the managed ones. mono_jit_info_table_find () must
|
|
|
+be made lock-free for this to work). Precise type info must be maintained
|
|
|
+for all the local variables. Precise type info should be maintained also
|
|
|
+for registers.
|
|
|
+Note that this is not a correctness issue, but a performance one. The more
|
|
|
+pointers to objects we can deal with precisely, the more effective the GC
|
|
|
+will be, since it will be able to move the objects. The first two todo items
|
|
|
+are mostly trivial, while handling precisely the thread stacks is complex to
|
|
|
+implement and to test and it has a cpu and memory use runtime penalty.
|
|
|
+In practice we need to be able to describe to the GC _all_ the memory
|
|
|
+locations that can hold a pointer to a managed object and we must tell it also
|
|
|
+if that location can contain:
|
|
|
+*) a pointer to the start of an object or NULL (typically a field of an object)
|
|
|
+*) a pinning pointer to an object (typically the result of the fixed statment in C#)
|
|
|
+*) a pointer to the managed heap or to other locations (a typical stack location)
|
|
|
+Since we need to provide to the GC all the locations it's not possible anymore to
|
|
|
+store any object in unmanaged memory if it is not explicitly pinned for the entire
|
|
|
+time the object is stored there. With the Boehm GC this was possible if the object
|
|
|
+was kept alive in some way, but with the new GC it is not valid anymore, because
|
|
|
+objects can move: the object will be kept alive because of the other reference, but the
|
|
|
+pointer in unmanaged memory won't be updated to the new location where the object
|
|
|
+has been moved.
|
|
|
+
|
|
|
+Most of the work for inserting write barrier calls is already done as well,
|
|
|
+but there may be still bugs in this area. In particular for it to work,
|
|
|
+the correct IL opcodes must be used when storing an object in a field or
|
|
|
+array element (most of the marshal.c code needs to be reviewed to use
|
|
|
+stind.ref instead of stind.i/stind.u when needed). When this is done, the
|
|
|
+JIT will take care of automatically inserting the write barriers.
|
|
|
+What the JIT does automatically for managed code, must be done manually
|
|
|
+in the runtime C code that deals with storing fields in objects and arrays
|
|
|
+or otherwise any operation that could change a pointer in the old generation
|
|
|
+to point to an object in the new generation. Sample cases are as follows:
|
|
|
+
|
|
|
+*) when using C structs that map to managed objects the following macro
|
|
|
+must be used to store an object in a field (the macro must not be used
|
|
|
+when storing non-objects and it should not be used when storing NULL values):
|
|
|
+
|
|
|
+ MONO_OBJECT_SETREF(obj,fieldname,value)
|
|
|
+where obj is the pointer to the object, fieldname is the name of the field in
|
|
|
+the C struct and value is a MonoObject*. Note that obj must be a correctly
|
|
|
+typed pointer to a struct that embeds MonoObject as the first field and
|
|
|
+have fieldname as a field.
|
|
|
+
|
|
|
+*) when setting the element of an array of references to an object, use the
|
|
|
+following macro:
|
|
|
+
|
|
|
+ mono_array_setref (array,index,value)
|
|
|
+
|
|
|
+*) when copying a number of references from an array to another:
|
|
|
+
|
|
|
+ mono_array_memcpy_refs (dest,destidx,src,srcidx,count)
|
|
|
+
|
|
|
+*) when copying a struct that may containe reference fields, use:
|
|
|
+
|
|
|
+ void mono_value_copy (gpointer dest, gpointer src, MonoClass *klass)
|
|
|
+
|
|
|
+*) when it is unknown if a pointer points to the stack or to the heap and an
|
|
|
+object needs to be stored through it, use:
|
|
|
+
|
|
|
+ void mono_gc_wbarrier_generic_store (gpointer ptr, MonoObject* value)
|
|
|
+
|
|
|
+Note that the support for write barriers in the runtime could be
|
|
|
+used to enable also the generational features of the Boehm GC.
|
|
|
+
|
|
|
+Some more documentation on the new GC is available at:
|
|
|
+http://www.mono-project.com/Compacting_GC
|
|
|
+
|
|
|
|