| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697 |
- While the Boehm GC is quite good, we need to move to a
- precise, generational GC for better performance and smaller
- memory usage (no false-positives memory retentions with big
- allocations).
- The first working implementation is committed in metadata/sgen-gc.c
- as of May, 2006. This is a two-generations moving collector and it is
- currently used to shake out all the issues in the runtime that need to
- be fixed in order to support precise generational and moving collectors.
- The two main issues are:
- 1) identify as precisely as possible all the pointers to managed objects
- 2) insert write barriers to be able to account for pointers in the old
- generation pointing to objects in the newer generations
- Point 1 is mostly complete. The runtime can register additional roots
- with the GC as needed and it provides to the GC precise info on the
- objects layout. In particular with the new precise GC it is not possible to
- store GC-allocated memory in IntPtr or UIntPtr fields (in fact, the new GC
- can allocate only objects and not GC-tracked untyped blobs of memory
- as the Boehm GC can do). Precise info is tracked also for static fields.
- What is currently missing here is:
- *) precise info for ThreadStatic and ContextStatic storage (this also requires
- better memory management for these sub-heaps)
- *) precise info for HANDLE_NORMAL gc handles
- *) precise info for thread stacks (this requires storing the info about
- managed stack frames along with the jitted code for a method and doing the
- stack walk for the active threads, considering conservatively the unmanaged
- stack frames and precisely the managed ones. mono_jit_info_table_find () must
- be made lock-free for this to work). Precise type info must be maintained
- for all the local variables. Precise type info should be maintained also
- for registers.
- Note that this is not a correctness issue, but a performance one. The more
- pointers to objects we can deal with precisely, the more effective the GC
- will be, since it will be able to move the objects. The first two todo items
- are mostly trivial, while handling precisely the thread stacks is complex to
- implement and to test and it has a cpu and memory use runtime penalty.
- In practice we need to be able to describe to the GC _all_ the memory
- locations that can hold a pointer to a managed object and we must tell it also
- if that location can contain:
- *) a pointer to the start of an object or NULL (typically a field of an object)
- *) a pinning pointer to an object (typically the result of the fixed statment in C#)
- *) a pointer to the managed heap or to other locations (a typical stack location)
- Since we need to provide to the GC all the locations it's not possible anymore to
- store any object in unmanaged memory if it is not explicitly pinned for the entire
- time the object is stored there. With the Boehm GC this was possible if the object
- was kept alive in some way, but with the new GC it is not valid anymore, because
- objects can move: the object will be kept alive because of the other reference, but the
- pointer in unmanaged memory won't be updated to the new location where the object
- has been moved.
- Most of the work for inserting write barrier calls is already done as well,
- but there may be still bugs in this area. In particular for it to work,
- the correct IL opcodes must be used when storing an object in a field or
- array element (most of the marshal.c code needs to be reviewed to use
- stind.ref instead of stind.i/stind.u when needed). When this is done, the
- JIT will take care of automatically inserting the write barriers.
- What the JIT does automatically for managed code, must be done manually
- in the runtime C code that deals with storing fields in objects and arrays
- or otherwise any operation that could change a pointer in the old generation
- to point to an object in the new generation. Sample cases are as follows:
- *) when using C structs that map to managed objects the following macro
- must be used to store an object in a field (the macro must not be used
- when storing non-objects and it should not be used when storing NULL values):
- MONO_OBJECT_SETREF(obj,fieldname,value)
- where obj is the pointer to the object, fieldname is the name of the field in
- the C struct and value is a MonoObject*. Note that obj must be a correctly
- typed pointer to a struct that embeds MonoObject as the first field and
- have fieldname as a field.
- *) when setting the element of an array of references to an object, use the
- following macro:
- mono_array_setref (array,index,value)
- *) when copying a number of references from an array to another:
- mono_array_memcpy_refs (dest,destidx,src,srcidx,count)
- *) when copying a struct that may containe reference fields, use:
- void mono_value_copy (gpointer dest, gpointer src, MonoClass *klass)
- *) when it is unknown if a pointer points to the stack or to the heap and an
- object needs to be stored through it, use:
- void mono_gc_wbarrier_generic_store (gpointer ptr, MonoObject* value)
- Note that the support for write barriers in the runtime could be
- used to enable also the generational features of the Boehm GC.
- Some more documentation on the new GC is available at:
- http://www.mono-project.com/Compacting_GC
|