| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677 |
- While the Boehm GC is quite good, we need to move to a
- precise, generational GC for better performance and smaller
- memory usage (no false-positives memory retentions with big
- allocations).
- This is a large task, but it can be done in steps.
- 1) use the GCJ support to mark reference fields in objects, so
- scanning the heap is faster. This is mostly done already, needs
- checking that it is always used correctly (big objects, arrays).
- There are also some data structures we use in the runtime that are
- currently untyped that are allocated in the Gc heap and used to
- keep references to GC objects. We need to make them typed as to
- precisely track GC references or make them non-GC memory,
- by using more the GC hnadle support code (MonoGHashTable, MonoDomain,
- etc).
- 2) don't include in the static roots the .bss and .data segments
- to save in scanning time and limit false-positives. This is mostly
- done already.
- 3) keep track precisely of stack locations and registers in native
- code generation. This basically requires the regalloc rewrite code
- first, if we don't want to duplicate much of it. This is the hardest
- task of all, since proving it's correctness is very hard. Some tricks,
- like having a build that injects GC.Collect() after every few simple
- operations may help. We also need to decide if we want to handle safe
- points at calls and back jumps only or at every instruction. The latter
- case is harder to implement and requires we keep around much more data
- (it potentially makes for faster stop-the-world phases).
- The first case requires us to be able to advance a thread until it
- reaches the next safe point: this can be done with the same techniques
- used by a debugger. We already need something like this to handle
- safely aborts happening in the middle of a prolog in managed code,
- for example, so this could be an additional sub-task that can be done
- separately from the GC work.
- Note that we can adapt the libgc code to use the info we collect
- when scanning the stack in managed methods and still use the conservative
- approach for the unmanaged stack, until we have our own collector,
- which requires we define a proper icall interface to switch from managed
- to unmanaged code (hwo to we handle object references in the icall
- implementations, for example).
- 4) we could make use of the generational capabilities of the
- Boehm GC, but not with the current method involving signals which
- may create incompatibilities and is not supported on all platforms.
- We need to start using write barriers: they will be required anyway
- for the generational GC we'll use. When a field holding a reference
- is changed in an object (or an item in an array), we mark the card
- or page where the field is stored as dirty. Later, when a collection
- is run, only objects in pages marked as dirty are scanned for
- references instead of the whole heap. This could take a few days to
- implement and probably much more time to debug if all the cases were
- not catched:-)
- 5) actually write the new generational and precise collector. There are
- several examples out there as open source projects, though the CLR
- needs some specific semantics so the code needs to be written from
- scratch anyway. Compared to item 3 this is relatively easer and it can
- be tested outside of mono, too, until mono is ready to use it.
- The important features needed:
- *) precise, so there is no false positive memory retention
- *) generational to reduce collection times
- *) pointer-hopping allocation to reduce alloc time
- *) possibly per-thread lock-free allocation
- *) handle weakrefs and finalizers with the CLR semantics
- Note: some GC engines use a single mmap area, because it makes
- handling generations and the implementation much easier, but this also
- limits the expension of the heap, so people may need to use a command-line
- option to set the max heap size etc. It would be better to have a design
- that allows mmapping a few megabytes chunks at a time.
- The different tasks can be done in parallel. 1, 2 and 4 can be done in time
- for the mono 1.2 release. Parts of 3 and 5 could be done as well.
- The complete switch is supposed to happen with the mono 2.0 release.
|