new-regalloc 3.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
  1. We need to switch to a new register allocator.
  2. The current one is split in a global and a local register allocator.
  3. The global one can assign only callee-saves registers and happens
  4. on the tree-based internal representation: it assigns local variables
  5. to hardware registers.
  6. The local one happens on the linear representation on a per basic
  7. block basis and assigns hard registers to virtual registers (which
  8. hold temporary values during expression executions) and it deals also
  9. with the platform-specific issues (fixed registers, call conventions).
  10. Moving to a different register will help solve some of the performance
  11. issues introduced by the above split, make the register more easily
  12. portable and solve some of the issues generated by dealing with trees.
  13. The general design ideas are below.
  14. The new allocator should have a global view of all the method, so it can be
  15. able to assign variables also to some of the volatile registers if possible,
  16. even across basic blocks (this would improve performance).
  17. The allocator would be driven by per-arch declarative data, so porting
  18. should be easier: an architecture needs to specify register classes,
  19. call convention and instructions requirements (similar to the gcc code).
  20. The allocator should operate on the linear representation, this way it's
  21. easier and faster to track usages more correctly. We need to assign virtual
  22. registers on a per-method basis instead of per basic block. We can assign
  23. virtual registers to variables, too. Note that since we fix the stack offset
  24. of local vars only after this step (which happens after the burg rules are run),
  25. some of the burg rules that try to optimize the code won't apply anymore:
  26. the peephole code may need to be enhanced to do the optimizations instead.
  27. We need to handle floating point registers in the global allocator, too.
  28. The new allocator also needs to keep track precisely of which registers
  29. contain references or managed pointers to allow us to move to a precise GC.
  30. It may be worth to use a single increasing set of integers for the virtual
  31. registers, with the class of the register stored separately (unless the
  32. current local allocator which keeps interger and fp registers separate).
  33. Since this is a large task, we need to do it in steps as much as possible.
  34. The first is to run the register allocator _after_ the burg rules: this
  35. requires a rewrite of the liveness code, too, to use linear indexes instead
  36. of basic-block/tree number combinations. This can be done by:
  37. *) allocating virtual regs to all the locals that can be register allocated
  38. *) running the burg rules (some may require adjustments): the local virtual
  39. registers are assigned starting from global-virt-regs+1, instead of the current
  40. hardware-regs+1, so we can tell apart global and local virt regs.
  41. *) running the liveness/whatever code is needed to allocate the global registers
  42. *) allocate the rest of the local variables to stack slots
  43. *) continue with the current local allocator
  44. This work could take 2-3 weeks.
  45. The next step is to define the kind of declarative data an architecture needs
  46. and assigning virtual regs to all the registers and making the allocator
  47. assign from the volatile registers, too.
  48. Note that some of the code that is currently emitted in the arch-specific
  49. code, will need to be emitted as instructions that the reg allocator
  50. can inspect: think of a method that returns the first argument which is
  51. received in a register: the current code copies it to either a local slot or
  52. to a global reg in the prolog an copies it back to the return register
  53. int he basic block, but since neither the regallocator nor the peephole code
  54. knows about the prolog code, the first store cannot be optimized away.
  55. The gcc code has some example of how to specify register classes in a
  56. declarative way.