tree-mover.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261
  1. Purpose
  2. Especially when inlining is active, it can happen that temporary
  3. variables add pressure to the register allocator, producing bad
  4. code.
  5. The idea is that some of these temporaries can be totally eliminated
  6. my moving the MonoInst tree that defines them directly to the use
  7. point in the code (so the name "tree mover").
  8. Please note that this is *not* an optimization: it is mostly a
  9. workaround to issues we have in the regalloc.
  10. Actually, with the new linear IR this will not be possible at all
  11. (there will be no more trees in the code!).
  12. Anyway, this workaround turns out to be useful in the current state
  13. of things...
  14. -----------------------------------------------------------------------
  15. Base logic
  16. If a local is defined by a value which is a proper expression (a tree
  17. of MonoInst, not just another local or a constant), and this definition
  18. is used only once, the tree can be moved directly to the use location,
  19. and the definition eliminated.
  20. Of course, none of the variables used in the tree must be defined in
  21. the code path between the definition and the use, and the tree must be
  22. free of side effects.
  23. We do not handle the cases when the tree is just a local or a constant
  24. because they are handled by copyprop and consprop, respectively.
  25. To make things simpler, we restrict the tree move to the case when:
  26. - the definition and the use are in the same BB, and
  27. - the use is followed by another definition in the same BB (it is not
  28. possible that the 1st value is used again), or alternatively there
  29. is no BB in the whole CFG that contains a use of this local before a
  30. definition (so, again, there is no code path that can lead to a
  31. subsequent use).
  32. To handle this, we maintain an ACT array (Available Copy Tree, similar
  33. to the ACP), where we store the "state" of every local.
  34. Ideally, every local can be in the following state:
  35. [E] Undefined (by a tree, it could be in the ACP but we don't care).
  36. [D] Defined (by a tree), and waiting for a use.
  37. [U] Used, with a tree definition available in the same BB, but still
  38. without a definition following the use (always in the same BB).
  39. Of course state [E] (empty) is the initial one.
  40. Besides, there are two sort of "meta states", or flags:
  41. [W] Still waiting for a use or definition in this BB (we have seen no
  42. occurrence of the local yet).
  43. [X] Used without being previously defined in the same BB (note that if
  44. there is a definition that precedes the use in the same BB, even if
  45. the definition is not a tree or is not available because of side
  46. effects or because the tree value has changed the local is not in
  47. state [X]).
  48. Also note that state [X] is a sort of "global" condition, which if set
  49. in one BB will stay valid for the whole CFG, even if the local will
  50. otherwise change state. The idea of flagging a local as [X] is that if
  51. there is a definition/use pair that reaches the end of a BB, it could
  52. be that there is a CFG path that then leads to the BB flagging it as
  53. [X] (which contains a use), so the tree cannot be moved.
  54. So state [X] will always be set, and never examined in all the state
  55. transitions we will describe.
  56. In practice, we use flag [W] to set state [X]: if, when traversing a
  57. BB, we find a use for a local in state [W], then that local is flagged
  58. [X].
  59. For each BB, we initialize all states to [E] and [W], and then we
  60. traverse the code one inst at a time, and update the variable states
  61. in the ACT in the following ways:
  62. [Definition]
  63. - Flag [W] is cleared.
  64. - All "affected trees" are killed (go from state [D] to [E]).
  65. The "affected trees" are the trees which contain (use) the defined
  66. local, and the rationale is that the tree value changed, so the
  67. tree is no longer available.
  68. - If the local was in state [U], *that* tree move is marked "safe"
  69. (because *this* definition makes us sure that the previous tree
  70. cannot be used again in any way).
  71. The idea is that "safe" moves can happen even if the local is
  72. flagged [X], because the second definition "covers" the use.
  73. The tree move is then saved in the "todo" list (and the affecting
  74. nodes are cleared).
  75. - If the local was defined by a tree, it goes to state [D], the tree
  76. is recorded, and all the locals used in it are marked as "affecting
  77. this tree" (of course these markers are lists, because each local
  78. could affect more than one tree).
  79. [IndirectDefinition]
  80. - All potentially affected trees (in state [D]) are killed.
  81. [Use]
  82. - If the local is still [W], it is flagged [X] (the [W] goes away).
  83. - If the local is in state [D], it goes to state [U].
  84. The tree move must not yet be recorded in the "todo" list, it still
  85. stays in the ACT slot belonging to this local.
  86. Anyway, the "affecting" nodes are updated, because now a definition
  87. of a local used in this tree will affect only "indirect" (or also
  88. "propagated") moves, but not *this* move (see below).
  89. - If the local is in state [U], then the tree cannot be moved (it is
  90. used two times): the move is canceled, and the state goes [E].
  91. - If the local is in state [E], the use is ignored.
  92. [IndirectUse]
  93. - All potentially affected trees (in state [D] or [U]) are killed.
  94. [SideEffect]
  95. - Tree is marked as "unmovable".
  96. Then, at the end of the BB, for each ACT slot:
  97. - If state is [U], the tree move is recorded in the "todo" list, but
  98. flagged "unsafe".
  99. - Anyway, state goes to [E], the [W] flag is set, and all "affecting"
  100. lists are cleared (we get ready to traverse the next BB).
  101. Finally, when all BBs has been scanned, we traverse the "todo" list,
  102. moving all "safe" entries, and moving "unsafe" ones only if their ACT
  103. slot is not flagged [X].
  104. So far, so good.
  105. But there are two issues that make things harder :-(
  106. The first is the concept of "indirect tree move".
  107. It can happen that a tree is scheduled for moving, and its destination
  108. is a use that is located in a second tree, which could also be moved.
  109. The main issue is that a definition of a variable of the 1st tree on
  110. the path between the definition and the use of the 2nd one must prevent
  111. the move.
  112. But which move? The 1st or the 2nd?
  113. Well, any of the two!
  114. The point is, the 2nd move must be prevented *only* if the 1st one
  115. happens: if it is aborted (for an [X] flag or any other reason), the
  116. 2nd move is OK, and vice versa...
  117. We must handle this in the following way:
  118. - The ACT must still remember if a slot is scheduled for moving in
  119. this BB, and if it is, all the locals used in the tree.
  120. We say that the slot is in state [M].
  121. Note that [M] is (like [X] and [W]) a sort of "meta state": a local
  122. is flagged [M] when it goes to state [U], and the flag is cleared
  123. when the tree move is cancelled
  124. - A tree that uses a local whose slot is in state [M] is also using all
  125. the locals used by the tree in state [M], but the use is "indirect".
  126. These use nodes are also included in the "affecting" lists.
  127. - The definition of a variable used in an "indirect" way has the
  128. effect of "linking" the two involved tree moves, saying that only one
  129. of the two can happen in practice, but not both.
  130. - When the 2nd tree is scheduled for moving, the 1st one is *still* in
  131. state [M], because a third move could "carry it forward", and all the
  132. *three* moves should be mutually exclusive (to be safe!).
  133. The second tricky complication is the "tree forwarding" that can happen
  134. when copyprop is involved.
  135. It is conceptually similar to the "indirect tree move".
  136. Only, the 2nd tree is not really a tree, it is just the local defined
  137. in the 1st tree move.
  138. It can happen that copyprop will propagate the definition.
  139. We cannot make treeprop do the same job of copyprop, because copyprop
  140. has less constraints, and is therefore more powerful in its scope.
  141. The main issue is that treeprop cannot propagate a tree to *two* uses,
  142. while copyprop is perfectly capable of propagating one definition to
  143. two (or more) different places.
  144. So we must let copyprop do its job otherwise we'll miss optimizations,
  145. but we must also make it play safe with treeprop.
  146. Let's clarify with an example:
  147. a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
  148. b = a; //a is used, state [U] with move scheduled, and
  149. //b is defined by a, ACP[b] is a, and b is in state [DC]
  150. c = b + v3; // b is used, goes to state [U]
  151. The real trouble is that copyprop happens *immediately*, while treeprop
  152. is deferred to the end of the CFG traversal.
  153. So, in the 3rd statement, the "b" is immediately turned into an "a" by
  154. copyprop, regardless of what treeprop will do.
  155. Anyway, if we are careful, this is not so bad.
  156. First of all, we must "accept" the fact that in the 3rd statement the
  157. "b" is in fact an "a", as treeprop must happen *after* copyprop.
  158. The real problem is that "a" is used twice: in the 2nd and 3rd lines.
  159. In our usual setup, the 2nd line would set it to [U], and the 3rd line
  160. would kill the move (and set "a" to [E]).
  161. I have tried to play tricks, and reason as of copyprop didn't happen,
  162. but everything becomes really messy.
  163. Instead, we should note that the 2nd line is very likely to be dead.
  164. At least in this BB, copyprop will turn all "b"s into "a"s as long as
  165. it can, and when it cannot, it will be because either "a" or "b" have
  166. been redefined, which would be after the tree move anyway.
  167. So, the reasoning gets different: let's pretend that "b" will be dead.
  168. This will make the "a" use in the 2nd statement useless, so there we
  169. can "reset" "a" to [D], but also take note that if "b" will end up
  170. not being dead, the tree move associated to this [D] must be aborted.
  171. We can detect this in the following way:
  172. - Either "b" is used before being defined in this BB, or
  173. - It will be flagged "unsafe".
  174. Both things are very easy to check.
  175. The only quirk is that the "affecting" lists must not be cleared when
  176. a slot goes to state [U], because a "propagation" could put it back
  177. to state [D] (where those lists are needed, because it can be killed
  178. by a definition to a used slot).
  179. -----------------------------------------------------------------------
  180. Implementation notes
  181. All the implementation runs inside the existing mono_local_cprop
  182. function, and a separate memory pool is used to hold the temporary
  183. data.
  184. A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
  185. the list of scheduled moves and auxiliary things.
  186. This struct is allocated if the tree move pass is requested, and is
  187. then passed along to all the involved functions, which are therefore
  188. aware of the tree mover state.
  189. The ACT is an array of slots, obviously one per local.
  190. Each slot is of type MonoTreeMoverActSlot, and contains the used and
  191. affected locals, a pointer to the pending tree move and the "waiting"
  192. and "unsafe" flags.
  193. The "affecting" lists a built from "dependency nodes", of type
  194. MonoTreeMoverDependencyNode.
  195. Each of the nodes contains the used and affected local, and is in
  196. two lists: the locals used by a slot, and the locals affected by a
  197. slot (obviously a different one).
  198. So, each node means: "variable x is used in tree t, so a definition
  199. of x affects tree t".
  200. The "affecting" lists are doubly linked, to allow for O(1) deletion.
  201. The "used" lists are simply linked, but when they are mantained there
  202. is always a pointer to the last element to allow for O(1) list moving.
  203. When a used list is dismissed (which happens often, any time a node is
  204. killed), its nodes are unlinked from their respective affecting lists
  205. and are then put in a "free" list in the MonoTreeMover to be reused.
  206. Each tree move is represented by a struct (MonoTreeMoverTreeMove),
  207. which contains:
  208. - the definition and use points,
  209. - the "affected" moves (recall the concept of "indirect tree move"),
  210. - the "must be dead" slots (recall "tree forwarding"). and
  211. - a few utility flags.
  212. The tree moves stays in the relevant ACT slot until it is ready to be
  213. scheduled for moving, at which point it is put in a list in the
  214. MonoTreeMover.
  215. The tree moves structs are reused when they are killed, so there is
  216. also a "free" list for them in the MonoTreeMover.
  217. The tree mover code has been added to all the relevant functions that
  218. participate in consprop and copyprop, particularly:
  219. - mono_cprop_copy_values takes care of variable uses (transitions from
  220. states [D] to [U] and [U] to [E] because of killing),
  221. - mono_cprop_invalidate_values takes care of side effects (indirect
  222. accesses, calls...),
  223. - mono_local_cprop_bb sets up and cleans the traversals for each BB,
  224. and for each MonoInst it takes care of variable definitions.
  225. To each of them has been added a MonoTreeMover parameter, which is not
  226. NULL if the tree mover is running.
  227. After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
  228. the list of all the pending moves, which must be walked to actually
  229. perform the moves (when possible, because "unsafe" flags, "affected"
  230. moves and "must be dead" slots can still have their effects, which
  231. must be handled now because they are fully known only at the end of
  232. the CFG traversal).