InAlloca.rst 6.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160
  1. ==========================================
  2. Design and Usage of the InAlloca Attribute
  3. ==========================================
  4. Introduction
  5. ============
  6. The :ref:`inalloca <attr_inalloca>` attribute is designed to allow
  7. taking the address of an aggregate argument that is being passed by
  8. value through memory. Primarily, this feature is required for
  9. compatibility with the Microsoft C++ ABI. Under that ABI, class
  10. instances that are passed by value are constructed directly into
  11. argument stack memory. Prior to the addition of inalloca, calls in LLVM
  12. were indivisible instructions. There was no way to perform intermediate
  13. work, such as object construction, between the first stack adjustment
  14. and the final control transfer. With inalloca, all arguments passed in
  15. memory are modelled as a single alloca, which can be stored to prior to
  16. the call. Unfortunately, this complicated feature comes with a large
  17. set of restrictions designed to bound the lifetime of the argument
  18. memory around the call.
  19. For now, it is recommended that frontends and optimizers avoid producing
  20. this construct, primarily because it forces the use of a base pointer.
  21. This feature may grow in the future to allow general mid-level
  22. optimization, but for now, it should be regarded as less efficient than
  23. passing by value with a copy.
  24. Intended Usage
  25. ==============
  26. The example below is the intended LLVM IR lowering for some C++ code
  27. that passes two default-constructed ``Foo`` objects to ``g`` in the
  28. 32-bit Microsoft C++ ABI.
  29. .. code-block:: c++
  30. // Foo is non-trivial.
  31. struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); };
  32. void g(Foo a, Foo b);
  33. void f() {
  34. g(Foo(), Foo());
  35. }
  36. .. code-block:: llvm
  37. %struct.Foo = type { i32, i32 }
  38. declare void @Foo_ctor(%struct.Foo* %this)
  39. declare void @Foo_dtor(%struct.Foo* %this)
  40. declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
  41. define void @f() {
  42. entry:
  43. %base = call i8* @llvm.stacksave()
  44. %memargs = alloca <{ %struct.Foo, %struct.Foo }>
  45. %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1
  46. call void @Foo_ctor(%struct.Foo* %b)
  47. ; If a's ctor throws, we must destruct b.
  48. %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0
  49. invoke void @Foo_ctor(%struct.Foo* %a)
  50. to label %invoke.cont unwind %invoke.unwind
  51. invoke.cont:
  52. call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs)
  53. call void @llvm.stackrestore(i8* %base)
  54. ...
  55. invoke.unwind:
  56. call void @Foo_dtor(%struct.Foo* %b)
  57. call void @llvm.stackrestore(i8* %base)
  58. ...
  59. }
  60. To avoid stack leaks, the frontend saves the current stack pointer with
  61. a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the
  62. argument stack space with alloca and calls the default constructor. The
  63. default constructor could throw an exception, so the frontend has to
  64. create a landing pad. The frontend has to destroy the already
  65. constructed argument ``b`` before restoring the stack pointer. If the
  66. constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI,
  67. ``g`` will destroy its arguments, and then the stack is restored in
  68. ``f``.
  69. Design Considerations
  70. =====================
  71. Lifetime
  72. --------
  73. The biggest design consideration for this feature is object lifetime.
  74. We cannot model the arguments as static allocas in the entry block,
  75. because all calls need to use the memory at the top of the stack to pass
  76. arguments. We cannot vend pointers to that memory at function entry
  77. because after code generation they will alias.
  78. The rule against allocas between argument allocations and the call site
  79. avoids this problem, but it creates a cleanup problem. Cleanup and
  80. lifetime is handled explicitly with stack save and restore calls. In
  81. the future, we may want to introduce a new construct such as ``freea``
  82. or ``afree`` to make it clear that this stack adjusting cleanup is less
  83. powerful than a full stack save and restore.
  84. Nested Calls and Copy Elision
  85. -----------------------------
  86. We also want to be able to support copy elision into these argument
  87. slots. This means we have to support multiple live argument
  88. allocations.
  89. Consider the evaluation of:
  90. .. code-block:: c++
  91. // Foo is non-trivial.
  92. struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); };
  93. Foo bar(Foo b);
  94. int main() {
  95. bar(bar(Foo()));
  96. }
  97. In this case, we want to be able to elide copies into ``bar``'s argument
  98. slots. That means we need to have more than one set of argument frames
  99. active at the same time. First, we need to allocate the frame for the
  100. outer call so we can pass it in as the hidden struct return pointer to
  101. the middle call. Then we do the same for the middle call, allocating a
  102. frame and passing its address to ``Foo``'s default constructor. By
  103. wrapping the evaluation of the inner ``bar`` with stack save and
  104. restore, we can have multiple overlapping active call frames.
  105. Callee-cleanup Calling Conventions
  106. ----------------------------------
  107. Another wrinkle is the existence of callee-cleanup conventions. On
  108. Windows, all methods and many other functions adjust the stack to clear
  109. the memory used to pass their arguments. In some sense, this means that
  110. the allocas are automatically cleared by the call. However, LLVM
  111. instead models this as a write of undef to all of the inalloca values
  112. passed to the call instead of a stack adjustment. Frontends should
  113. still restore the stack pointer to avoid a stack leak.
  114. Exceptions
  115. ----------
  116. There is also the possibility of an exception. If argument evaluation
  117. or copy construction throws an exception, the landing pad must do
  118. cleanup, which includes adjusting the stack pointer to avoid a stack
  119. leak. This means the cleanup of the stack memory cannot be tied to the
  120. call itself. There needs to be a separate IR-level instruction that can
  121. perform independent cleanup of arguments.
  122. Efficiency
  123. ----------
  124. Eventually, it should be possible to generate efficient code for this
  125. construct. In particular, using inalloca should not require a base
  126. pointer. If the backend can prove that all points in the CFG only have
  127. one possible stack level, then it can address the stack directly from
  128. the stack pointer. While this is not yet implemented, the plan is that
  129. the inalloca attribute should not change much, but the frontend IR
  130. generation recommendations may change.