opcode-decomp.txt 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
  1. * How to handle complex IL opcodes in an arch-independent way
  2. Many IL opcodes are very simple: add, ldind etc.
  3. Such opcodes can be implemented with a single cpu instruction
  4. in most architectures (on some, a group of IL instructions
  5. can be converted to a single cpu op).
  6. There are many IL opcodes, though, that are more complex, but
  7. can be expressed as a series of trees or a single tree of
  8. simple operations. Such simple operations are architecture-independent.
  9. It makes sense to decompose such complex IL instructions in their
  10. simpler equivalent so that we gain in several ways:
  11. *) porting effort is easier, because only the simple instructions
  12. need to be implemented in arch-specific code
  13. *) we could apply BURG rules to the trees and do pattern matching
  14. on them to optimize the expressions according to the host cpu
  15. The issue is: where do we do such conversion from coarse opcodes to
  16. simple expressions?
  17. * Doing the conversion in method_to_ir ()
  18. Some of these conversions can certainly be done in method_to_ir (),
  19. but it's not always easy to decide which are better done there and
  20. which in a different pass.
  21. For example, let's take ldlen: in the mono implementation, ldlen
  22. can be simply implemented with a load from a fixed position in the
  23. array object:
  24. len = [reg + maxlen_offset]
  25. However, ldlen carries also semantics information: the result is the
  26. length of the array, and since in the CLR arrays are of fixed size,
  27. this information can be useful to later do bounds check removal.
  28. If we convert this opcode in method_to_ir () we lost some useful
  29. information for further optimizations.
  30. In some other ways, decomposing an opcode in method_to_ir() may
  31. allow for better optimizations later on (need to come up with an
  32. example here ...).
  33. * Doing the conversion in inssel.brg
  34. Some conversion may be done inside the burg rules: this has the
  35. disadvantage that the instruction selector is not run again on
  36. the resulting expression tree and we could miss some optimization
  37. (this is what effectively happens with the coarse opcodes in the old
  38. jit). This may also interfere with an efficient local register allocator.
  39. It may be possible to add an extension in monoburg that allows a rule
  40. such as:
  41. recheck: LDLEN (reg) {
  42. create an expression tree representing LDLEN
  43. and return it
  44. }
  45. When the monoburg label process gets back a recheck, it will run
  46. the labeling again on the resulting expression tree.
  47. If this is possible at all (and in an efficient way) is a
  48. question for dietmar:-)
  49. It should be noted, though, that this may not always work, since
  50. some complex IL opcodes may require a series of expression trees
  51. and handling such cases in monoburg could become quite hairy.
  52. For example, think of opcode that need to do multiple actions on the
  53. same object: this basically means a DUP...
  54. On the other end, if a complex opcode needs a DUP, monoburg doesn't
  55. actually need to create trees if it emits the instructions in
  56. the correct sequence and maintains the right values in the registers
  57. (usually the values that need a DUP are not changed...). How
  58. this integrates with the current register allocator is not clear, since
  59. that assigns registers based on the rule, but the instructions emitted
  60. by the rules may be different (this already happens with the current JIT
  61. where a MULT is replaced with lea etc...).
  62. * Doing it in a separate pass.
  63. Doing the conversion in a separate pass over the instructions
  64. is another alternative. This can be done right after method_to_ir ()
  65. or after the SSA pass (since the IR after the SSA pass should look
  66. almost like the IR we get back from method_to_ir ()).
  67. This has the following advantages:
  68. *) monoburg will handle only the simple opcodes (makes porting easier)
  69. *) the instruction selection will be run on all the additional trees
  70. *) it's easier to support coarse opcodes that produce multiple expression
  71. trees (and apply the monoburg selector on all of them)
  72. *) the SSA optimizer will see the original opcodes and will be able to use
  73. the semantic info associated with them
  74. The disadvantage is that this is a separate pass on the code and
  75. it takes time (how much has not been measured yet, though).
  76. With this approach, we may also be able to have C implementations
  77. of some of the opcodes: this pass would insert a function call to
  78. the C implementation (for example in the cases when first porting
  79. to a new arch and implemenating some stuff may be too hard in asm).
  80. * Extended basic blocks
  81. IL code needs a lot of checks, bounds checks, overflow checks,
  82. type checks and so on. This potentially increases by a lot
  83. the number of basic blocks in a control flow graph. However,
  84. all such blocks end up with a throw opcode that gives control to the
  85. exception handling mechanism.
  86. After method_to_ir () a MonoBasicBlock can be considered a sort
  87. of extended basic block where the additional exits don't point
  88. to basic blocks in the same procedure (at least when the method
  89. doesn't have exception tables).
  90. We need to make sure the passes following method_to_ir () can cope
  91. with such kinds of extended basic blocks (especially the passes
  92. that we need to apply to all the methods: as a start, we could
  93. skip SSA optimizations for methods with exception clauses...)