|
|
@@ -11,7 +11,53 @@ We are designing a JIT compiler, so we have to consider two things:
|
|
|
The current approach is to keep the JITer as simple as possible, and thus as
|
|
|
fast as possible. The generated code quality will suffer from that.
|
|
|
|
|
|
-X86 register allocation:
|
|
|
+We do not map local variables to registers at the moment, and this makes the
|
|
|
+whole JIT much easier, for example we do not need to identify basic block
|
|
|
+boundaries or the lifetime of local variables, or select the variables which
|
|
|
+are worth to put into a register.
|
|
|
+
|
|
|
+Register allocation is thus done only inside the trees of the forest, and each
|
|
|
+tree can use the full set of registers. We simply split a tree if we get out of
|
|
|
+registers, for example the following tree:
|
|
|
+
|
|
|
+
|
|
|
+ add(R0)
|
|
|
+ / \
|
|
|
+ / \
|
|
|
+ a(R0) add(R1)
|
|
|
+ / \
|
|
|
+ / \
|
|
|
+ b(R1) add(R2)
|
|
|
+ / \
|
|
|
+ / \
|
|
|
+ c(R2) b(R3)
|
|
|
+
|
|
|
+can be transformed to:
|
|
|
+
|
|
|
+
|
|
|
+ stloc(t1) add(R0)
|
|
|
+ | / \
|
|
|
+ | / \
|
|
|
+ add(R0) a(R0) add(R1)
|
|
|
+ / \ / \
|
|
|
+ / \ / \
|
|
|
+ c(R0) b(R1) b(R1) t1(R2)
|
|
|
+
|
|
|
+
|
|
|
+Please notice that the split trees use less registers than the original
|
|
|
+tree.
|
|
|
+
|
|
|
+
|
|
|
+Register Allocation:
|
|
|
+====================
|
|
|
+
|
|
|
+With lcc you can assign a fixed register to a tree before register
|
|
|
+allocation. For example this is needed by call, which return the value always
|
|
|
+in EAX on x86. The current implementation works without such system, due to
|
|
|
+special forest generation.
|
|
|
+
|
|
|
+
|
|
|
+X86 Register Allocation:
|
|
|
========================
|
|
|
|
|
|
We can use 8bit or 16bit registers on the x86. If we use that feature we have
|
|
|
@@ -27,17 +73,28 @@ Most processors have more that one register set, at least one for floating
|
|
|
point values, and one for integers. Should we support architectures with more
|
|
|
that two sets? Does someone knows such an architecture?
|
|
|
|
|
|
-Register Allocation:
|
|
|
-====================
|
|
|
+64bit Integer Values:
|
|
|
+=====================
|
|
|
+
|
|
|
+I can imagine two different implementation. On possibility would be to treat
|
|
|
+long (64bit) values simply like any other value type. This implies that we
|
|
|
+call class methods for ALU operations like add or sub. Sure, this method will
|
|
|
+be be a bit inefficient.
|
|
|
+
|
|
|
+The more performant solution is to allocate two 32bit registers for each 64bit
|
|
|
+value. We add a new non terminal to the monoburg grammar called long_reg. The
|
|
|
+register allocation routines takes care of this non terminal and allocates two
|
|
|
+registers for them.
|
|
|
|
|
|
-With lcc you can assign a fixed register to a tree before register
|
|
|
-allocation. For example this is needed by call, which return the value always
|
|
|
-in EAX on x86. The current implementation works without such system (due to
|
|
|
-special forest generation), and I wonder if we really need this feature?
|
|
|
|
|
|
Forest generation:
|
|
|
==================
|
|
|
|
|
|
+It seems that trees generated from the CIL language have some special
|
|
|
+properties, i.e. the trees already represents basic blocks, so there can be no
|
|
|
+branches to the inside of such a tree. All results of those trees are stored to
|
|
|
+memory.
|
|
|
+
|
|
|
One idea was to drive the code generation directly from the CIL code, without
|
|
|
generating an intermediate forest of trees. I think this is not possible,
|
|
|
because you always have to gather some attributes and attach it to the
|
|
|
@@ -46,8 +103,6 @@ tree is the right thing and that also works perfectly with monoburg. IMO we
|
|
|
would not get any benefit from trying to feed monoburg directly with CIL
|
|
|
instructions.
|
|
|
|
|
|
-We can also speedup the tree generation by using alloca instead of malloc.
|
|
|
-
|
|
|
DAG handling:
|
|
|
=============
|
|
|
|