| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166 |
- Just some thoughts for the JITer:
- General issues:
- ===============
- We are designing a JIT compiler, so we have to consider two things:
- - the quality of the generated code
- - the time needed to generate that code
- The current approach is to keep the JITer as simple as possible, and thus as
- fast as possible. The generated code quality will suffer from that.
- We do not map local variables to registers at the moment, and this makes the
- whole JIT much easier, for example we do not need to identify basic block
- boundaries or the lifetime of local variables, or select the variables which
- are worth to put into a register.
- Register allocation is thus done only inside the trees of the forest, and each
- tree can use the full set of registers. We simply split a tree if we get out of
- registers, for example the following tree:
- add(R0)
- / \
- / \
- a(R0) add(R1)
- / \
- / \
- b(R1) add(R2)
- / \
- / \
- c(R2) b(R3)
- can be transformed to:
- stloc(t1) add(R0)
- | / \
- | / \
- add(R0) a(R0) add(R1)
- / \ / \
- / \ / \
- c(R0) b(R1) b(R1) t1(R2)
- Please notice that the split trees use less registers than the original
- tree.
- Triggering JIT compilation:
- ===========================
- The current approach is to call functions indirectly. The address to call is
- stored in the MonoMethod structure. For each method we create a trampoline
- function. When called, this function does the JIT compilation and replaces the
- trampoline with the compiled method address.
- Register Allocation:
- ====================
- With lcc you can assign a fixed register to a tree before register
- allocation. For example this is needed by call, which return the value always
- in EAX on x86. The current implementation works without such system, due to
- special forest generation.
- X86 Register Allocation:
- ========================
- We can use 8bit or 16bit registers on the x86. If we use that feature we have
- more registers to allocate, which maybe prevents some register spills. We
- currently ignore that ability and always allocate 32 bit registers, because I
- think we would gain very little from that optimisation and it would complicate
- the code.
- Different Register Sets:
- ========================
- Most processors have more that one register set, at least one for floating
- point values, and one for integers. Should we support architectures with more
- that two sets? Does someone knows such an architecture?
- 64bit Integer Values:
- =====================
- I can imagine two different implementation. On possibility would be to treat
- long (64bit) values simply like any other value type. This implies that we
- call class methods for ALU operations like add or sub. Sure, this method will
- be be a bit inefficient.
- The more performant solution is to allocate two 32bit registers for each 64bit
- value. We add a new non terminal to the monoburg grammar called long_reg. The
- register allocation routines takes care of this non terminal and allocates two
- registers for them.
- Forest generation:
- ==================
- It seems that trees generated from the CIL language have some special
- properties, i.e. the trees already represents basic blocks, so there can be no
- branches to the inside of such a tree. All results of those trees are stored to
- memory.
- One idea was to drive the code generation directly from the CIL code, without
- generating an intermediate forest of trees. I think this is not possible,
- because you always have to gather some attributes and attach it to the
- instruction (for example the register allocation info). So I thing generating a
- tree is the right thing and that also works perfectly with monoburg. IMO we
- would not get any benefit from trying to feed monoburg directly with CIL
- instructions.
- DAG handling:
- =============
- Monoburg can't handle DAGs, instead we need real trees as input for
- the code generator. So we have two problems:
- 1.) DUP instruction: This one is obvious - we need to store the value
- into a temporary variable to solve the problem.
- 2.) function calls: Chapter 12.8, page 343 of "A retargetable C compiler"
- explains that: "because listing a call node will give it a hidden reference
- from the code list". I don't understand that (can someone explain that?), but
- there is another reason to save return values to temporaries: Consider the
- following code:
- x = f(y) + g(z); // all functions return integers
- We could generate such a tree for this expression: STLOC(ADD(CALL,CALL))
- The problem is that both calls returns the value in the same register,
- so it is non trivial to generate code for that tree. We must copy one
- register into another one, which make register allocation more complex.
- The easier solution is store the result of function calls to
- temporaries. This leads to the following forest:
- STLOC(CALL)
- STLOC(CALL)
- STLOC(ADD (LDLOC, LDLOC))
- This is what lcc is doing, if I understood 12.8, page 342, 343?
- Value Types:
- ============
- The only CLI instructions which can handle value types are loads and stores,
- either to local variable, to the stack or to array elements. Value types with a
- size smaller than sizeof(int) are handled like any other basic type. For other
- value types we load the base address and emit block copies to store them.
- Possible Optimisations:
- =======================
- Miguel said ORP does some optimisation on IL level, for example moving array
- bounds checking out of loops:
- for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
- id transformed to:
- if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; }
- else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
- The else is only to keep original semantics (exception handling).
|