| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113 |
- * How to handle complex IL opcodes in an arch-independent way
- Many IL opcodes are very simple: add, ldind etc.
- Such opcodes can be implemented with a single cpu instruction
- in most architectures (on some, a group of IL instructions
- can be converted to a single cpu op).
- There are many IL opcodes, though, that are more complex, but
- can be expressed as a series of trees or a single tree of
- simple operations. Such simple operations are architecture-independent.
- It makes sense to decompose such complex IL instructions in their
- simpler equivalent so that we gain in several ways:
- *) porting effort is easier, because only the simple instructions
- need to be implemented in arch-specific code
- *) we could apply BURG rules to the trees and do pattern matching
- on them to optimize the expressions according to the host cpu
-
- The issue is: where do we do such conversion from coarse opcodes to
- simple expressions?
- * Doing the conversion in method_to_ir ()
- Some of these conversions can certainly be done in method_to_ir (),
- but it's not always easy to decide which are better done there and
- which in a different pass.
- For example, let's take ldlen: in the mono implementation, ldlen
- can be simply implemented with a load from a fixed position in the
- array object:
- len = [reg + maxlen_offset]
-
- However, ldlen carries also semantics information: the result is the
- length of the array, and since in the CLR arrays are of fixed size,
- this information can be useful to later do bounds check removal.
- If we convert this opcode in method_to_ir () we lost some useful
- information for further optimizations.
- In some other ways, decomposing an opcode in method_to_ir() may
- allow for better optimizations later on (need to come up with an
- example here ...).
- * Doing the conversion in inssel.brg
- Some conversion may be done inside the burg rules: this has the
- disadvantage that the instruction selector is not run again on
- the resulting expression tree and we could miss some optimization
- (this is what effectively happens with the coarse opcodes in the old
- jit). This may also interfere with an efficient local register allocator.
- It may be possible to add an extension in monoburg that allows a rule
- such as:
- recheck: LDLEN (reg) {
- create an expression tree representing LDLEN
- and return it
- }
-
- When the monoburg label process gets back a recheck, it will run
- the labeling again on the resulting expression tree.
- If this is possible at all (and in an efficient way) is a
- question for dietmar:-)
- It should be noted, though, that this may not always work, since
- some complex IL opcodes may require a series of expression trees
- and handling such cases in monoburg could become quite hairy.
- For example, think of opcode that need to do multiple actions on the
- same object: this basically means a DUP...
- On the other end, if a complex opcode needs a DUP, monoburg doesn't
- actually need to create trees if it emits the instructions in
- the correct sequence and maintains the right values in the registers
- (usually the values that need a DUP are not changed...). How
- this integrates with the current register allocator is not clear, since
- that assigns registers based on the rule, but the instructions emitted
- by the rules may be different (this already happens with the current JIT
- where a MULT is replaced with lea etc...).
- * Doing it in a separate pass.
- Doing the conversion in a separate pass over the instructions
- is another alternative. This can be done right after method_to_ir ()
- or after the SSA pass (since the IR after the SSA pass should look
- almost like the IR we get back from method_to_ir ()).
- This has the following advantages:
- *) monoburg will handle only the simple opcodes (makes porting easier)
- *) the instruction selection will be run on all the additional trees
- *) it's easier to support coarse opcodes that produce multiple expression
- trees (and apply the monoburg selector on all of them)
- *) the SSA optimizer will see the original opcodes and will be able to use
- the semantic info associated with them
-
- The disadvantage is that this is a separate pass on the code and
- it takes time (how much has not been measured yet, though).
- With this approach, we may also be able to have C implementations
- of some of the opcodes: this pass would insert a function call to
- the C implementation (for example in the cases when first porting
- to a new arch and implemenating some stuff may be too hard in asm).
- * Extended basic blocks
- IL code needs a lot of checks, bounds checks, overflow checks,
- type checks and so on. This potentially increases by a lot
- the number of basic blocks in a control flow graph. However,
- all such blocks end up with a throw opcode that gives control to the
- exception handling mechanism.
- After method_to_ir () a MonoBasicBlock can be considered a sort
- of extended basic block where the additional exits don't point
- to basic blocks in the same procedure (at least when the method
- doesn't have exception tables).
- We need to make sure the passes following method_to_ir () can cope
- with such kinds of extended basic blocks (especially the passes
- that we need to apply to all the methods: as a start, we could
- skip SSA optimizations for methods with exception clauses...)
|