precise-gc 4.1 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
  1. While the Boehm GC is quite good, we need to move to a
  2. precise, generational GC for better performance and smaller
  3. memory usage (no false-positives memory retentions with big
  4. allocations).
  5. This is a large task, but it can be done in steps.
  6. 1) use the GCJ support to mark reference fields in objects, so
  7. scanning the heap is faster. This is mostly done already, needs
  8. checking that it is always used correctly (big objects, arrays).
  9. There are also some data structures we use in the runtime that are
  10. currently untyped that are allocated in the Gc heap and used to
  11. keep references to GC objects. We need to make them typed as to
  12. precisely track GC references or make them non-GC memory,
  13. by using more the GC hnadle support code (MonoGHashTable, MonoDomain,
  14. etc).
  15. 2) don't include in the static roots the .bss and .data segments
  16. to save in scanning time and limit false-positives. This is mostly
  17. done already.
  18. 3) keep track precisely of stack locations and registers in native
  19. code generation. This basically requires the regalloc rewrite code
  20. first, if we don't want to duplicate much of it. This is the hardest
  21. task of all, since proving it's correctness is very hard. Some tricks,
  22. like having a build that injects GC.Collect() after every few simple
  23. operations may help. We also need to decide if we want to handle safe
  24. points at calls and back jumps only or at every instruction. The latter
  25. case is harder to implement and requires we keep around much more data
  26. (it potentially makes for faster stop-the-world phases).
  27. The first case requires us to be able to advance a thread until it
  28. reaches the next safe point: this can be done with the same techniques
  29. used by a debugger. We already need something like this to handle
  30. safely aborts happening in the middle of a prolog in managed code,
  31. for example, so this could be an additional sub-task that can be done
  32. separately from the GC work.
  33. Note that we can adapt the libgc code to use the info we collect
  34. when scanning the stack in managed methods and still use the conservative
  35. approach for the unmanaged stack, until we have our own collector,
  36. which requires we define a proper icall interface to switch from managed
  37. to unmanaged code (hwo to we handle object references in the icall
  38. implementations, for example).
  39. 4) we could make use of the generational capabilities of the
  40. Boehm GC, but not with the current method involving signals which
  41. may create incompatibilities and is not supported on all platforms.
  42. We need to start using write barriers: they will be required anyway
  43. for the generational GC we'll use. When a field holding a reference
  44. is changed in an object (or an item in an array), we mark the card
  45. or page where the field is stored as dirty. Later, when a collection
  46. is run, only objects in pages marked as dirty are scanned for
  47. references instead of the whole heap. This could take a few days to
  48. implement and probably much more time to debug if all the cases were
  49. not catched:-)
  50. 5) actually write the new generational and precise collector. There are
  51. several examples out there as open source projects, though the CLR
  52. needs some specific semantics so the code needs to be written from
  53. scratch anyway. Compared to item 3 this is relatively easer and it can
  54. be tested outside of mono, too, until mono is ready to use it.
  55. The important features needed:
  56. *) precise, so there is no false positive memory retention
  57. *) generational to reduce collection times
  58. *) pointer-hopping allocation to reduce alloc time
  59. *) possibly per-thread lock-free allocation
  60. *) handle weakrefs and finalizers with the CLR semantics
  61. Note: some GC engines use a single mmap area, because it makes
  62. handling generations and the implementation much easier, but this also
  63. limits the expension of the heap, so people may need to use a command-line
  64. option to set the max heap size etc. It would be better to have a design
  65. that allows mmapping a few megabytes chunks at a time.
  66. The different tasks can be done in parallel. 1, 2 and 4 can be done in time
  67. for the mono 1.2 release. Parts of 3 and 5 could be done as well.
  68. The complete switch is supposed to happen with the mono 2.0 release.