jit-regalloc 9.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283
  1. Register Allocation
  2. ===================
  3. The current JIT implementation uses a tree matcher to generate code. We use a
  4. simple algorithm to allocate registers in trees, and limit the number of used
  5. temporary register to 4 when evaluating trees. So we can use 2 registers for
  6. global register allocation.
  7. Register Allocation for Trees
  8. =============================
  9. We always evaluate trees from left to right. When there are no more registers
  10. available we need to spill values to memory. Here is the simplified algorithm.
  11. gboolean
  12. tree_allocate_regs (tree, exclude_reg)
  13. {
  14. if (!tree_allocate_regs (tree->left, -1))
  15. return FALSE;
  16. if (!tree_allocate_regs (tree->right, -1)) {
  17. tree->left->spilled == TRUE;
  18. free_used_regs (tree->left);
  19. if (!tree_allocate_regs (tree->right, tree->left->reg))
  20. return FALSE;
  21. }
  22. free_used_regs (tree->left);
  23. free_used_regs (tree->right);
  24. /* try to allocate a register (reg != exclude_reg) */
  25. if ((tree->reg = next_free_reg (exclude_reg)) != -1)
  26. return TRUE;
  27. return FALSE;
  28. }
  29. The emit routing actually spills the registers:
  30. tree_emit (tree)
  31. {
  32. tree_emit (tree->left);
  33. if (tree->left->spilled)
  34. save_reg (tree->left->reg);
  35. tree_emit (tree->right);
  36. if (tree->left->spilled)
  37. restore_reg (tree->left->reg);
  38. emit_code (tree);
  39. }
  40. Global Register Allocation
  41. ==========================
  42. TODO.
  43. Local Register Allocation
  44. =========================
  45. This section describes the cross-platform local register allocator which is
  46. in the file mini-codegen.c.
  47. The input to the allocator is a basic block which contains linear IL, ie.
  48. instructions of the form:
  49. DEST <- SRC1 OP SRC2
  50. where DEST, SRC1, and SRC2 are virtual registers (vregs). The job of the
  51. allocator is to assign hard or physical registers (hregs) to each virtual
  52. registers so the vreg references in the instructions can be replaced with their
  53. assigned hreg, allowing machine code to be generated later.
  54. The allocator needs information about the number and types of arguments of
  55. instructions. It takes this information from the machine description files. It
  56. also needs arch specific information, like the number and type of the hard
  57. registers. It gets this information from arch-specific macros.
  58. Currently, the vregs and hregs are partitioned into two classes: integer and
  59. floating point.
  60. The allocator consists of two phases: In the first phase, a forward pass is
  61. made over the instructions, collecting liveness information for vregs. In the
  62. second phase, a backward pass is made over the instructions, assigning
  63. registers. This backward mode of operation makes understanding the allocator
  64. somewhat difficult to understand, but leads to better code in most cases.
  65. Allocator state
  66. ===============
  67. The state of the allocator is stored in two arrays: iassign and isymbolic.
  68. iassign maps vregs to hregs, while isymbolic is the opposite.
  69. For a vreg, iassign [vreg] can contain the following values:
  70. * -1 vreg has no assigned hreg
  71. * hreg index (>= 0) vreg is assigned to the given hreg. This means
  72. later instructions (which we have already
  73. processed due to the backward direction) expect
  74. the value of vreg to be found in hreg.
  75. * spill slot index (< -1) vreg is spilled to the given spill slot. This
  76. means later instructions expect the value of
  77. vreg to be found on the stack in the given
  78. spill slot.
  79. Also, the allocator keeps track of which hregs are free and which are used.
  80. This information is stored in a bitmask called ifree_mask.
  81. There is a similar set of data structures for floating point registers.
  82. Spilling
  83. ========
  84. When an allocator needs a free hreg, but all of them are assigned, it needs to
  85. free up one of them. It does this by spilling the contents of the vreg which
  86. is currently assigned to the selected hreg. Since later instructions expect
  87. the vreg to be found in the selected hreg, the allocator emits a spill-load
  88. instruction to load the value from the spill slot into the hreg after the
  89. currently processed instruction. When the vreg which is spilled is a
  90. destination in an instruction, the allocator will emit a spill-store to store
  91. the value into the spill slot.
  92. Fixed registers
  93. ===============
  94. Some architectures, notably x86/amd64 require that the arguments/results of
  95. some instructions be assigned to specific hregs. An example is the shift
  96. opcodes on x86, where the second argument must be in ECX. The allocator
  97. has support for this. It tries to allocate the vreg to the required hreg. If
  98. thats not possible, then it will emit compensation code which moves values to
  99. the correct registers before/after the instruction.
  100. Fixed registers are mainly used on x86, but they are useful on more regular
  101. architectures on well, for example to model that after a call instruction, the
  102. return of the call is in a specific register.
  103. A special case of fixed registers is two address architectures, like the x86,
  104. where the instructions place their results into their first argument. This is
  105. modelled in the allocator by allocating SRC1 and DEST to the same hreg.
  106. Global registers
  107. ================
  108. Some variables might already be allocated to hardware registers during the
  109. global allocation phase. In this case, SRC1, SRC2 and DEST might already be
  110. a hardware register. The allocator needs to do nothing in this case, except
  111. when the architecture uses fixed registers, in which case it needs to emit
  112. compensation code.
  113. Register pairs
  114. ==============
  115. 64 bit arithmetic on 32 bit machines requires instructions whose arguments are
  116. not registers, but register pairs. The allocator has support for this, both
  117. for freely allocatable register pairs, and for register pairs which are
  118. constrained to specific hregs (EDX:EAX on x86).
  119. Floating point stack
  120. ====================
  121. The x86 architecture uses a floating point register stack instead of a set of
  122. fp registers. The allocator supports this by keeping track of the height of the
  123. fp stack, and spilling/loading values from the stack as neccesary.
  124. Calls
  125. =====
  126. Calls need special handling for two reasons: first, they will clobber all
  127. caller-save registers, meaning their contents will need to be spilled. Also,
  128. some architectures pass arguments in registers. The registers used for
  129. passing arguments are usually the same as the ones used for local allocation,
  130. so the allocator needs to handle them specially. This is done as follows:
  131. the MonoInst for the call instruction contains a map mapping vregs which
  132. contain the argument values to hregs where the argument needs to be placed,
  133. like this (on amd64):
  134. R33 -> RDI
  135. R34 -> RSI
  136. ...
  137. When the allocator processes the call instruction, it allocates the vregs
  138. in the map to their associated hregs. So the call instruction is processed as
  139. if having a variable number of arguments which fixed register assignments.
  140. An example:
  141. R33 <- 1
  142. R34 <- 2
  143. call
  144. When the call instruction is processed, R33 is assigned to RDI, and R34 is
  145. assigned to RSI. Later, when the two assignment instructions are processed,
  146. R33 and R34 are already assigned to a hreg, so they are replaced with the
  147. associated hreg leading to the following final code:
  148. RDI <- 1
  149. RSI <- 1
  150. call
  151. Machine description files
  152. =========================
  153. A typical entry in the machine description files looks like this:
  154. shl: dest:i src1:i src2:s clob:1 len:2
  155. The allocator is only interested in the dest,src1,src2 and clob fields.
  156. It understands the following values for the dest, src1, src2 fields:
  157. i - integer register
  158. f - fp register
  159. b - base register (same as i, but the instruction does not modify the reg)
  160. m - fp register, even if an fp stack is used (no fp stack tracking)
  161. It understands the following values for the clob field:
  162. 1 - sreg1 needs to be the same as dreg
  163. c - instruction clobbers the caller-save registers
  164. Beside these values, an architecture can define additional values (like the 's'
  165. in the example). The allocator depends on a set of arch-specific macros to
  166. convert these values to information it needs during allocation.
  167. Arch specific macros
  168. ====================
  169. These macros usually receive a value from the machine description file (like
  170. the 's' in the example). The examples below are for x86.
  171. /*
  172. * A bitmask selecting the caller-save registers (these are used for local
  173. * allocation).
  174. */
  175. #define MONO_ARCH_CALLEE_REGS X86_CALLEE_REGS
  176. /*
  177. * A bitmask selecting the callee-saved registers (these are usually used for
  178. * global allocation).
  179. */
  180. #define MONO_ARCH_CALLEE_SAVED_REGS X86_CALLER_REGS
  181. /* Same for the floating point registers */
  182. #define MONO_ARCH_CALLEE_FREGS 0
  183. #define MONO_ARCH_CALLEE_SAVED_FREGS 0
  184. /* Whenever the target uses a floating point stack */
  185. #define MONO_ARCH_USE_FPSTACK TRUE
  186. /* The size of the floating point stack */
  187. #define MONO_ARCH_FPSTACK_SIZE 6
  188. /*
  189. * Given a descriptor value from the machine description file, return the fixed
  190. * hard reg corresponding to that value.
  191. */
  192. #define MONO_ARCH_INST_FIXED_REG(desc) ((desc == 's') ? X86_ECX : ((desc == 'a') ? X86_EAX : ((desc == 'd') ? X86_EDX : ((desc == 'y') ? X86_EAX : ((desc == 'l') ? X86_EAX : -1)))))
  193. /*
  194. * A bitmask selecting the hregs which can be used for allocating sreg2 for
  195. * a given instruction.
  196. */
  197. #define MONO_ARCH_INST_SREG2_MASK(ins) (((ins [MONO_INST_CLOB] == 'a') || (ins [MONO_INST_CLOB] == 'd')) ? (1 << X86_EDX) : 0)
  198. /*
  199. * Given a descriptor value, return whenever it denotes a register pair.
  200. */
  201. #define MONO_ARCH_INST_IS_REGPAIR(desc) (desc == 'l' || desc == 'L')
  202. /*
  203. * Given a descriptor value, and the first register of a regpair, return a
  204. * bitmask selecting the hregs which can be used for allocating the second
  205. * register of the regpair.
  206. */
  207. #define MONO_ARCH_INST_REGPAIR_REG2(desc,hreg1) (desc == 'l' ? X86_EDX : -1)