| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160 |
- ==========================================
- Design and Usage of the InAlloca Attribute
- ==========================================
- Introduction
- ============
- The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
- taking the address of an aggregate argument that is being passed by
- value through memory. Primarily, this feature is required for
- compatibility with the Microsoft C++ ABI. Under that ABI, class
- instances that are passed by value are constructed directly into
- argument stack memory. Prior to the addition of inalloca, calls in LLVM
- were indivisible instructions. There was no way to perform intermediate
- work, such as object construction, between the first stack adjustment
- and the final control transfer. With inalloca, all arguments passed in
- memory are modelled as a single alloca, which can be stored to prior to
- the call. Unfortunately, this complicated feature comes with a large
- set of restrictions designed to bound the lifetime of the argument
- memory around the call.
- For now, it is recommended that frontends and optimizers avoid producing
- this construct, primarily because it forces the use of a base pointer.
- This feature may grow in the future to allow general mid-level
- optimization, but for now, it should be regarded as less efficient than
- passing by value with a copy.
- Intended Usage
- ==============
- The example below is the intended LLVM IR lowering for some C++ code
- that passes two default-constructed ``Foo`` objects to ``g`` in the
- 32-bit Microsoft C++ ABI.
- .. code-block:: c++
- // Foo is non-trivial.
- struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
- void g(Foo a, Foo b);
- void f() {
- g(Foo(), Foo());
- }
- .. code-block:: llvm
- %struct.Foo = type { i32, i32 }
- declare void @Foo_ctor(%struct.Foo* %this)
- declare void @Foo_dtor(%struct.Foo* %this)
- declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
- define void @f() {
- entry:
- %base = call i8* @llvm.stacksave()
- %memargs = alloca <{ %struct.Foo, %struct.Foo }>
- %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
- call void @Foo_ctor(%struct.Foo* %b)
- ; If a's ctor throws, we must destruct b.
- %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
- invoke void @Foo_ctor(%struct.Foo* %a)
- to label %invoke.cont unwind %invoke.unwind
- invoke.cont:
- call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
- call void @llvm.stackrestore(i8* %base)
- ...
- invoke.unwind:
- call void @Foo_dtor(%struct.Foo* %b)
- call void @llvm.stackrestore(i8* %base)
- ...
- }
- To avoid stack leaks, the frontend saves the current stack pointer with
- a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
- argument stack space with alloca and calls the default constructor. The
- default constructor could throw an exception, so the frontend has to
- create a landing pad. The frontend has to destroy the already
- constructed argument ``b`` before restoring the stack pointer. If the
- constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
- ``g`` will destroy its arguments, and then the stack is restored in
- ``f``.
- Design Considerations
- =====================
- Lifetime
- --------
- The biggest design consideration for this feature is object lifetime.
- We cannot model the arguments as static allocas in the entry block,
- because all calls need to use the memory at the top of the stack to pass
- arguments. We cannot vend pointers to that memory at function entry
- because after code generation they will alias.
- The rule against allocas between argument allocations and the call site
- avoids this problem, but it creates a cleanup problem. Cleanup and
- lifetime is handled explicitly with stack save and restore calls. In
- the future, we may want to introduce a new construct such as ``freea``
- or ``afree`` to make it clear that this stack adjusting cleanup is less
- powerful than a full stack save and restore.
- Nested Calls and Copy Elision
- -----------------------------
- We also want to be able to support copy elision into these argument
- slots. This means we have to support multiple live argument
- allocations.
- Consider the evaluation of:
- .. code-block:: c++
- // Foo is non-trivial.
- struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
- Foo bar(Foo b);
- int main() {
- bar(bar(Foo()));
- }
- In this case, we want to be able to elide copies into ``bar``'s argument
- slots. That means we need to have more than one set of argument frames
- active at the same time. First, we need to allocate the frame for the
- outer call so we can pass it in as the hidden struct return pointer to
- the middle call. Then we do the same for the middle call, allocating a
- frame and passing its address to ``Foo``'s default constructor. By
- wrapping the evaluation of the inner ``bar`` with stack save and
- restore, we can have multiple overlapping active call frames.
- Callee-cleanup Calling Conventions
- ----------------------------------
- Another wrinkle is the existence of callee-cleanup conventions. On
- Windows, all methods and many other functions adjust the stack to clear
- the memory used to pass their arguments. In some sense, this means that
- the allocas are automatically cleared by the call. However, LLVM
- instead models this as a write of undef to all of the inalloca values
- passed to the call instead of a stack adjustment. Frontends should
- still restore the stack pointer to avoid a stack leak.
- Exceptions
- ----------
- There is also the possibility of an exception. If argument evaluation
- or copy construction throws an exception, the landing pad must do
- cleanup, which includes adjusting the stack pointer to avoid a stack
- leak. This means the cleanup of the stack memory cannot be tied to the
- call itself. There needs to be a separate IR-level instruction that can
- perform independent cleanup of arguments.
- Efficiency
- ----------
- Eventually, it should be possible to generate efficient code for this
- construct. In particular, using inalloca should not require a base
- pointer. If the backend can prove that all points in the CFG only have
- one possible stack level, then it can address the stack directly from
- the stack pointer. While this is not yet implemented, the plan is that
- the inalloca attribute should not change much, but the frontend IR
- generation recommendations may change.
|