| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768 |
- We need to switch to a new register allocator.
- The current one is split in a global and a local register allocator.
- The global one can assign only callee-saves registers and happens
- on the tree-based internal representation: it assigns local variables
- to hardware registers.
- The local one happens on the linear representation on a per basic
- block basis and assigns hard registers to virtual registers (which
- hold temporary values during expression executions) and it deals also
- with the platform-specific issues (fixed registers, call conventions).
- Moving to a different register will help solve some of the performance
- issues introduced by the above split, make the register more easily
- portable and solve some of the issues generated by dealing with trees.
- The general design ideas are below.
- The new allocator should have a global view of all the method, so it can be
- able to assign variables also to some of the volatile registers if possible,
- even across basic blocks (this would improve performance).
- The allocator would be driven by per-arch declarative data, so porting
- should be easier: an architecture needs to specify register classes,
- call convention and instructions requirements (similar to the gcc code).
- The allocator should operate on the linear representation, this way it's
- easier and faster to track usages more correctly. We need to assign virtual
- registers on a per-method basis instead of per basic block. We can assign
- virtual registers to variables, too. Note that since we fix the stack offset
- of local vars only after this step (which happens after the burg rules are run),
- some of the burg rules that try to optimize the code won't apply anymore:
- the peephole code may need to be enhanced to do the optimizations instead.
- We need to handle floating point registers in the global allocator, too.
- The new allocator also needs to keep track precisely of which registers
- contain references or managed pointers to allow us to move to a precise GC.
- It may be worth to use a single increasing set of integers for the virtual
- registers, with the class of the register stored separately (unless the
- current local allocator which keeps interger and fp registers separate).
- Since this is a large task, we need to do it in steps as much as possible.
- The first is to run the register allocator _after_ the burg rules: this
- requires a rewrite of the liveness code, too, to use linear indexes instead
- of basic-block/tree number combinations. This can be done by:
- *) allocating virtual regs to all the locals that can be register allocated
- *) running the burg rules (some may require adjustments): the local virtual
- registers are assigned starting from global-virt-regs+1, instead of the current
- hardware-regs+1, so we can tell apart global and local virt regs.
- *) running the liveness/whatever code is needed to allocate the global registers
- *) allocate the rest of the local variables to stack slots
- *) continue with the current local allocator
- This work could take 2-3 weeks.
- The next step is to define the kind of declarative data an architecture needs
- and assigning virtual regs to all the registers and making the allocator
- assign from the volatile registers, too.
- Note that some of the code that is currently emitted in the arch-specific
- code, will need to be emitted as instructions that the reg allocator
- can inspect: think of a method that returns the first argument which is
- received in a register: the current code copies it to either a local slot or
- to a global reg in the prolog an copies it back to the return register
- int he basic block, but since neither the regallocator nor the peephole code
- knows about the prolog code, the first store cannot be optimized away.
- The gcc code has some example of how to specify register classes in a
- declarative way.
|