| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261 |
- Purpose
- Especially when inlining is active, it can happen that temporary
- variables add pressure to the register allocator, producing bad
- code.
- The idea is that some of these temporaries can be totally eliminated
- my moving the MonoInst tree that defines them directly to the use
- point in the code (so the name "tree mover").
- Please note that this is *not* an optimization: it is mostly a
- workaround to issues we have in the regalloc.
- Actually, with the new linear IR this will not be possible at all
- (there will be no more trees in the code!).
- Anyway, this workaround turns out to be useful in the current state
- of things...
- -----------------------------------------------------------------------
- Base logic
- If a local is defined by a value which is a proper expression (a tree
- of MonoInst, not just another local or a constant), and this definition
- is used only once, the tree can be moved directly to the use location,
- and the definition eliminated.
- Of course, none of the variables used in the tree must be defined in
- the code path between the definition and the use, and the tree must be
- free of side effects.
- We do not handle the cases when the tree is just a local or a constant
- because they are handled by copyprop and consprop, respectively.
- To make things simpler, we restrict the tree move to the case when:
- - the definition and the use are in the same BB, and
- - the use is followed by another definition in the same BB (it is not
- possible that the 1st value is used again), or alternatively there
- is no BB in the whole CFG that contains a use of this local before a
- definition (so, again, there is no code path that can lead to a
- subsequent use).
- To handle this, we maintain an ACT array (Available Copy Tree, similar
- to the ACP), where we store the "state" of every local.
- Ideally, every local can be in the following state:
- [E] Undefined (by a tree, it could be in the ACP but we don't care).
- [D] Defined (by a tree), and waiting for a use.
- [U] Used, with a tree definition available in the same BB, but still
- without a definition following the use (always in the same BB).
- Of course state [E] (empty) is the initial one.
- Besides, there are two sort of "meta states", or flags:
- [W] Still waiting for a use or definition in this BB (we have seen no
- occurrence of the local yet).
- [X] Used without being previously defined in the same BB (note that if
- there is a definition that precedes the use in the same BB, even if
- the definition is not a tree or is not available because of side
- effects or because the tree value has changed the local is not in
- state [X]).
- Also note that state [X] is a sort of "global" condition, which if set
- in one BB will stay valid for the whole CFG, even if the local will
- otherwise change state. The idea of flagging a local as [X] is that if
- there is a definition/use pair that reaches the end of a BB, it could
- be that there is a CFG path that then leads to the BB flagging it as
- [X] (which contains a use), so the tree cannot be moved.
- So state [X] will always be set, and never examined in all the state
- transitions we will describe.
- In practice, we use flag [W] to set state [X]: if, when traversing a
- BB, we find a use for a local in state [W], then that local is flagged
- [X].
- For each BB, we initialize all states to [E] and [W], and then we
- traverse the code one inst at a time, and update the variable states
- in the ACT in the following ways:
- [Definition]
- - Flag [W] is cleared.
- - All "affected trees" are killed (go from state [D] to [E]).
- The "affected trees" are the trees which contain (use) the defined
- local, and the rationale is that the tree value changed, so the
- tree is no longer available.
- - If the local was in state [U], *that* tree move is marked "safe"
- (because *this* definition makes us sure that the previous tree
- cannot be used again in any way).
- The idea is that "safe" moves can happen even if the local is
- flagged [X], because the second definition "covers" the use.
- The tree move is then saved in the "todo" list (and the affecting
- nodes are cleared).
- - If the local was defined by a tree, it goes to state [D], the tree
- is recorded, and all the locals used in it are marked as "affecting
- this tree" (of course these markers are lists, because each local
- could affect more than one tree).
- [IndirectDefinition]
- - All potentially affected trees (in state [D]) are killed.
- [Use]
- - If the local is still [W], it is flagged [X] (the [W] goes away).
- - If the local is in state [D], it goes to state [U].
- The tree move must not yet be recorded in the "todo" list, it still
- stays in the ACT slot belonging to this local.
- Anyway, the "affecting" nodes are updated, because now a definition
- of a local used in this tree will affect only "indirect" (or also
- "propagated") moves, but not *this* move (see below).
- - If the local is in state [U], then the tree cannot be moved (it is
- used two times): the move is canceled, and the state goes [E].
- - If the local is in state [E], the use is ignored.
- [IndirectUse]
- - All potentially affected trees (in state [D] or [U]) are killed.
- [SideEffect]
- - Tree is marked as "unmovable".
- Then, at the end of the BB, for each ACT slot:
- - If state is [U], the tree move is recorded in the "todo" list, but
- flagged "unsafe".
- - Anyway, state goes to [E], the [W] flag is set, and all "affecting"
- lists are cleared (we get ready to traverse the next BB).
- Finally, when all BBs has been scanned, we traverse the "todo" list,
- moving all "safe" entries, and moving "unsafe" ones only if their ACT
- slot is not flagged [X].
- So far, so good.
- But there are two issues that make things harder :-(
- The first is the concept of "indirect tree move".
- It can happen that a tree is scheduled for moving, and its destination
- is a use that is located in a second tree, which could also be moved.
- The main issue is that a definition of a variable of the 1st tree on
- the path between the definition and the use of the 2nd one must prevent
- the move.
- But which move? The 1st or the 2nd?
- Well, any of the two!
- The point is, the 2nd move must be prevented *only* if the 1st one
- happens: if it is aborted (for an [X] flag or any other reason), the
- 2nd move is OK, and vice versa...
- We must handle this in the following way:
- - The ACT must still remember if a slot is scheduled for moving in
- this BB, and if it is, all the locals used in the tree.
- We say that the slot is in state [M].
- Note that [M] is (like [X] and [W]) a sort of "meta state": a local
- is flagged [M] when it goes to state [U], and the flag is cleared
- when the tree move is cancelled
- - A tree that uses a local whose slot is in state [M] is also using all
- the locals used by the tree in state [M], but the use is "indirect".
- These use nodes are also included in the "affecting" lists.
- - The definition of a variable used in an "indirect" way has the
- effect of "linking" the two involved tree moves, saying that only one
- of the two can happen in practice, but not both.
- - When the 2nd tree is scheduled for moving, the 1st one is *still* in
- state [M], because a third move could "carry it forward", and all the
- *three* moves should be mutually exclusive (to be safe!).
- The second tricky complication is the "tree forwarding" that can happen
- when copyprop is involved.
- It is conceptually similar to the "indirect tree move".
- Only, the 2nd tree is not really a tree, it is just the local defined
- in the 1st tree move.
- It can happen that copyprop will propagate the definition.
- We cannot make treeprop do the same job of copyprop, because copyprop
- has less constraints, and is therefore more powerful in its scope.
- The main issue is that treeprop cannot propagate a tree to *two* uses,
- while copyprop is perfectly capable of propagating one definition to
- two (or more) different places.
- So we must let copyprop do its job otherwise we'll miss optimizations,
- but we must also make it play safe with treeprop.
- Let's clarify with an example:
- a = v1 + v2; //a is defined by a tree, state [D], uses v2 and v2
- b = a; //a is used, state [U] with move scheduled, and
- //b is defined by a, ACP[b] is a, and b is in state [DC]
- c = b + v3; // b is used, goes to state [U]
- The real trouble is that copyprop happens *immediately*, while treeprop
- is deferred to the end of the CFG traversal.
- So, in the 3rd statement, the "b" is immediately turned into an "a" by
- copyprop, regardless of what treeprop will do.
- Anyway, if we are careful, this is not so bad.
- First of all, we must "accept" the fact that in the 3rd statement the
- "b" is in fact an "a", as treeprop must happen *after* copyprop.
- The real problem is that "a" is used twice: in the 2nd and 3rd lines.
- In our usual setup, the 2nd line would set it to [U], and the 3rd line
- would kill the move (and set "a" to [E]).
- I have tried to play tricks, and reason as of copyprop didn't happen,
- but everything becomes really messy.
- Instead, we should note that the 2nd line is very likely to be dead.
- At least in this BB, copyprop will turn all "b"s into "a"s as long as
- it can, and when it cannot, it will be because either "a" or "b" have
- been redefined, which would be after the tree move anyway.
- So, the reasoning gets different: let's pretend that "b" will be dead.
- This will make the "a" use in the 2nd statement useless, so there we
- can "reset" "a" to [D], but also take note that if "b" will end up
- not being dead, the tree move associated to this [D] must be aborted.
- We can detect this in the following way:
- - Either "b" is used before being defined in this BB, or
- - It will be flagged "unsafe".
- Both things are very easy to check.
- The only quirk is that the "affecting" lists must not be cleared when
- a slot goes to state [U], because a "propagation" could put it back
- to state [D] (where those lists are needed, because it can be killed
- by a definition to a used slot).
- -----------------------------------------------------------------------
- Implementation notes
- All the implementation runs inside the existing mono_local_cprop
- function, and a separate memory pool is used to hold the temporary
- data.
- A struct, MonoTreeMover, contains the pointers to the pool, the ACT,
- the list of scheduled moves and auxiliary things.
- This struct is allocated if the tree move pass is requested, and is
- then passed along to all the involved functions, which are therefore
- aware of the tree mover state.
- The ACT is an array of slots, obviously one per local.
- Each slot is of type MonoTreeMoverActSlot, and contains the used and
- affected locals, a pointer to the pending tree move and the "waiting"
- and "unsafe" flags.
- The "affecting" lists a built from "dependency nodes", of type
- MonoTreeMoverDependencyNode.
- Each of the nodes contains the used and affected local, and is in
- two lists: the locals used by a slot, and the locals affected by a
- slot (obviously a different one).
- So, each node means: "variable x is used in tree t, so a definition
- of x affects tree t".
- The "affecting" lists are doubly linked, to allow for O(1) deletion.
- The "used" lists are simply linked, but when they are mantained there
- is always a pointer to the last element to allow for O(1) list moving.
- When a used list is dismissed (which happens often, any time a node is
- killed), its nodes are unlinked from their respective affecting lists
- and are then put in a "free" list in the MonoTreeMover to be reused.
- Each tree move is represented by a struct (MonoTreeMoverTreeMove),
- which contains:
- - the definition and use points,
- - the "affected" moves (recall the concept of "indirect tree move"),
- - the "must be dead" slots (recall "tree forwarding"). and
- - a few utility flags.
- The tree moves stays in the relevant ACT slot until it is ready to be
- scheduled for moving, at which point it is put in a list in the
- MonoTreeMover.
- The tree moves structs are reused when they are killed, so there is
- also a "free" list for them in the MonoTreeMover.
- The tree mover code has been added to all the relevant functions that
- participate in consprop and copyprop, particularly:
- - mono_cprop_copy_values takes care of variable uses (transitions from
- states [D] to [U] and [U] to [E] because of killing),
- - mono_cprop_invalidate_values takes care of side effects (indirect
- accesses, calls...),
- - mono_local_cprop_bb sets up and cleans the traversals for each BB,
- and for each MonoInst it takes care of variable definitions.
- To each of them has been added a MonoTreeMover parameter, which is not
- NULL if the tree mover is running.
- After mono_local_cprop_bb has run for all BBs, the MonoTreeMover has
- the list of all the pending moves, which must be walked to actually
- perform the moves (when possible, because "unsafe" flags, "affected"
- moves and "must be dead" slots can still have their effects, which
- must be handled now because they are fully known only at the end of
- the CFG traversal).
|