aot-compiler.txt 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246
  1. Mono Ahead Of Time Compiler
  2. ===========================
  3. The Ahead of Time compilation feature in Mono allows Mono to
  4. precompile assemblies to minimize JIT time, reduce memory
  5. usage at runtime and increase the code sharing across multiple
  6. running Mono application.
  7. To precompile an assembly use the following command:
  8. mono --aot -O=all assembly.exe
  9. The `--aot' flag instructs Mono to ahead-of-time compile your
  10. assembly, while the -O=all flag instructs Mono to use all the
  11. available optimizations.
  12. * Caching metadata
  13. ------------------
  14. Besides code, the AOT file also contains cached metadata information which allows
  15. the runtime to avoid certain computations at runtime, like the computation of
  16. generic vtables. This reduces both startup time, and memory usage. It is possible
  17. to create an AOT image which contains only this cached information and no code by
  18. using the 'metadata-only' option during compilation:
  19. mono --aot=metadata-only assembly.exe
  20. This works even on platforms where AOT is not normally supported.
  21. * Position Independent Code
  22. ---------------------------
  23. On x86 and x86-64 the code generated by Ahead-of-Time compiled
  24. images is position-independent code. This allows the same
  25. precompiled image to be reused across multiple applications
  26. without having different copies: this is the same way in which
  27. ELF shared libraries work: the code produced can be relocated
  28. to any address.
  29. The implementation of Position Independent Code had a
  30. performance impact on Ahead-of-Time compiled images but
  31. compiler bootstraps are still faster than JIT-compiled images,
  32. specially with all the new optimizations provided by the Mono
  33. engine.
  34. * How to support Position Independent Code in new Mono Ports
  35. ------------------------------------------------------------
  36. Generated native code needs to reference various runtime
  37. structures/functions whose address is only known at run
  38. time. JITted code can simple embed the address into the native
  39. code, but AOT code needs to do an indirection. This
  40. indirection is done through a table called the Global Offset
  41. Table (GOT), which is similar to the GOT table in the Elf
  42. spec. When the runtime saves the AOT image, it saves some
  43. information for each method describing the GOT table entries
  44. used by that method. When loading a method from an AOT image,
  45. the runtime will fill out the GOT entries needed by the
  46. method.
  47. * Computing the address of the GOT
  48. Methods which need to access the GOT first need to compute its
  49. address. On the x86 it is done by code like this:
  50. call <IP + 5>
  51. pop ebx
  52. add <OFFSET TO GOT>, ebx
  53. <save got addr to a register>
  54. The variable representing the got is stored in
  55. cfg->got_var. It is allways allocated to a global register to
  56. prevent some problems with branches + basic blocks.
  57. * Referencing GOT entries
  58. Any time the native code needs to access some other runtime
  59. structure/function (i.e. any time the backend calls
  60. mono_add_patch_info ()), the code pointed by the patch needs
  61. to load the value from the got. For example, instead of:
  62. call <ABSOLUTE ADDR>
  63. it needs to do:
  64. call *<OFFSET>(<GOT REG>)
  65. Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
  66. For more examples on the changes required, see
  67. svn diff -r 37739:38213 mini-x86.c
  68. * The Program Linkage Table
  69. As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
  70. made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
  71. most architectures, call instructions use a displacement instead of an absolute address, so
  72. they are already position independent. An PLT entry is usually a jump instruction, which
  73. initially points to some trampoline code which transfers control to the AOT loader, which
  74. will compile the called method, and patch the PLT entry so that further calls are made
  75. directly to the called method.
  76. If the called method is in the same assembly, and does not need initialization (i.e. it
  77. doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
  78. * The Precompiled File Format
  79. -----------------------------
  80. We use the native object format of the platform. That way it
  81. is possible to reuse existing tools like objdump and the
  82. dynamic loader. All we need is a working assembler, i.e. we
  83. write out a text file which is then passed to gas (the gnu
  84. assembler) to generate the object file.
  85. The precompiled image is stored in a file next to the original
  86. assembly that is precompiled with the native extension for a shared
  87. library (on Linux its ".so" to the generated file).
  88. For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
  89. The following things are saved in the object file and can be
  90. looked up using the equivalent to dlsym:
  91. mono_assembly_guid
  92. A copy of the assembly GUID.
  93. mono_aot_version
  94. The format of the AOT file format.
  95. mono_aot_opt_flags
  96. The optimizations flags used to build this
  97. precompiled image.
  98. method_infos
  99. Contains additional information needed by the runtime for using the
  100. precompiled method, like the GOT entries it uses.
  101. method_info_offsets
  102. Maps method indexes to offsets in the method_infos array.
  103. mono_icall_table
  104. A table that lists all the internal calls
  105. references by the precompiled image.
  106. mono_image_table
  107. A list of assemblies referenced by this AOT
  108. module.
  109. methods
  110. The precompiled code itself.
  111. method_offsets
  112. Maps method indexes to offsets in the methods array.
  113. ex_info
  114. Contains information about methods which is rarely used during normal execution,
  115. like exception and debug info.
  116. ex_info_offsets
  117. Maps method indexes to offsets in the ex_info array.
  118. class_info
  119. Contains precomputed metadata used to speed up various runtime functions.
  120. class_info_offsets
  121. Maps class indexes to offsets in the class_info array.
  122. class_name_table
  123. A hash table mapping class names to class indexes. Used to speed up
  124. mono_class_from_name ().
  125. plt
  126. The Program Linkage Table
  127. plt_info
  128. Contains information needed to find the method belonging to a given PLT entry.
  129. * Performance considerations
  130. ----------------------------
  131. Using AOT code is a trade-off which might lead to higher or
  132. slower performance, depending on a lot of circumstances. Some
  133. of these are:
  134. - AOT code needs to be loaded from disk before being used, so
  135. cold startup of an application using AOT code MIGHT be
  136. slower than using JITed code. Warm startup (when the code is
  137. already in the machines cache) should be faster. Also,
  138. JITing code takes time, and the JIT compiler also need to
  139. load additional metadata for the method from the disk, so
  140. startup can be faster even in the cold startup case.
  141. - AOT code is usually compiled with all optimizations turned
  142. on, while JITted code is usually compiled with default
  143. optimizations, so the generated code in the AOT case should
  144. be faster.
  145. - JITted code can directly access runtime data structures and
  146. helper functions, while AOT code needs to go through an
  147. indirection (the GOT) to access them, so it will be slower
  148. and somewhat bigger as well.
  149. - When JITting code, the JIT compiler needs to load a lot of
  150. metadata about methods and types into memory.
  151. - JITted code has better locality, meaning that if A method
  152. calls B, then the native code for A and B is usually quite
  153. close in memory, leading to better cache behaviour thus
  154. improved performance. In contrast, the native code of
  155. methods inside the AOT file is in a somewhat random order.
  156. * Future Work
  157. -------------
  158. - Currently, when an AOT module is loaded, all of its
  159. dependent assemblies are also loaded eagerly, and these
  160. assemblies need to be exactly the same as the ones loaded
  161. when the AOT module was created ('hard binding'). Non-hard
  162. binding should be allowed.
  163. - On x86, the generated code uses call 0, pop REG, add
  164. GOTOFFSET, REG to materialize the GOT address. Newer
  165. versions of gcc use a separate function to do this, maybe we
  166. need to do the same.
  167. - Currently, we get vtable addresses from the GOT. Another
  168. solution would be to store the data from the vtables in the
  169. .bss section, so accessing them would involve less
  170. indirection.