| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125 |
- Just some thoughts for the JITer:
- General issues:
- ===============
- We are designing a JIT compiler, so we have to consider two things:
- - the quality of the generated code
- - the time needed to generate that code
- The current approach is to keep the JITer as simple as possible, and thus as
- fast as possible. The generated code quality will suffer from that.
- Register Allocation:
- ====================
- With lcc you can assign a fixed register to a tree before register
- allocation. For example this is needed by call, which return the value always
- in EAX on x86. The current implementation works without such system, due to
- special forest generation.
- Different Register Sets:
- ========================
- Most processors have more that one register set, at least one for floating
- point values, and one for integers. Should we support architectures with more
- that two sets? Does someone knows such an architecture?
- 64bit Integer Values:
- =====================
- I can imagine two different implementation. On possibility would be to treat
- long (64bit) values simply like any other value type. This implies that we
- call class methods for ALU operations like add or sub. Sure, this method will
- be be a bit inefficient.
- The more performant solution is to allocate two 32bit registers for each 64bit
- value. We add a new non terminal to the monoburg grammar called "lreg". The
- register allocation routines takes care of this non terminal and allocates two
- 32 bit registers for them.
- Forest generation:
- ==================
- Consider the following code:
- OPCODE: STACK LOCALS
- LDLOC.0 (5) [5,0]
- LDC.1 (5,1) [5,0]
- STLOC.0 (5) [1,0]
- STLOC.1 () [1,5]
- A simple forest generation generates:
- STLOC.0(LDC.1)
- STLOC.1(LDLOC.0)
- Which is wrong, since it stores the wrong value (1 instead of 5). Instead we
- must generate something like:
- STLOC.TMP(LDLOC.0)
- STLOC.0(LDC.1)
- STLOC.1(LDLOC.TMP)
- Where STLOC.TMP saves the value into a new temporary variable.
- We also need a similar solution for basic block boundaries when the stack depth
- is not zero. We can simply save those values to new temporary values. Consider
- the following basic block with one instruction:
- LDLOC.1
- This should generate a tree like:
- STLOC.TMP(LDLOC.1) Please notice that an intelligent register allocator can
- still allocate registers for those new variables.
- DAG handling:
- =============
- Monoburg can't handle DAGs, instead we need real trees as input for
- the code generator. So we have two problems:
- 1.) DUP instruction: This one is obvious - we need to store the value
- into a temporary variable to solve the problem.
- 2.) function calls: Chapter 12.8, page 343 of "A retargetable C compiler"
- explains that: "because listing a call node will give it a hidden reference
- from the code list". I don't understand that (can someone explain that?), but
- there is another reason to save return values to temporaries: Consider the
- following code:
- x = f(y) + g(z); // all functions return integers
- We could generate such a tree for this expression: STLOC(ADD(CALL,CALL))
- The problem is that both calls returns the value in the same register,
- so it is non trivial to generate code for that tree. We must copy one
- register into another one, which make register allocation more complex.
- The easier solution is store the result of function calls to
- temporaries. This leads to the following forest:
- STLOC(CALL)
- STLOC(CALL)
- STLOC(ADD (LDLOC, LDLOC))
- This is what lcc is doing, if I understood 12.8, page 342, 343?
- Possible Optimisations:
- =======================
- Miguel said ORP does some optimisation on IL level, for example moving array
- bounds checking out of loops:
- for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
- id transformed to:
- if (in_range (a, 0, N)) { for (i = 0; i < N; i++) a[i] = X; }
- else for (i = 0; i < N; i++) { check_range (a, i); a [i] = X; }
- The "else" is only to keep original semantics (exception handling).
- We need loop detection logic in order to implement this (dominator tree).
- AFAIK CACAO also implements this.
|