| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246 |
- Mono Ahead Of Time Compiler
- ===========================
- The Ahead of Time compilation feature in Mono allows Mono to
- precompile assemblies to minimize JIT time, reduce memory
- usage at runtime and increase the code sharing across multiple
- running Mono application.
- To precompile an assembly use the following command:
-
- mono --aot -O=all assembly.exe
- The `--aot' flag instructs Mono to ahead-of-time compile your
- assembly, while the -O=all flag instructs Mono to use all the
- available optimizations.
- * Caching metadata
- ------------------
- Besides code, the AOT file also contains cached metadata information which allows
- the runtime to avoid certain computations at runtime, like the computation of
- generic vtables. This reduces both startup time, and memory usage. It is possible
- to create an AOT image which contains only this cached information and no code by
- using the 'metadata-only' option during compilation:
- mono --aot=metadata-only assembly.exe
- This works even on platforms where AOT is not normally supported.
- * Position Independent Code
- ---------------------------
- On x86 and x86-64 the code generated by Ahead-of-Time compiled
- images is position-independent code. This allows the same
- precompiled image to be reused across multiple applications
- without having different copies: this is the same way in which
- ELF shared libraries work: the code produced can be relocated
- to any address.
- The implementation of Position Independent Code had a
- performance impact on Ahead-of-Time compiled images but
- compiler bootstraps are still faster than JIT-compiled images,
- specially with all the new optimizations provided by the Mono
- engine.
- * How to support Position Independent Code in new Mono Ports
- ------------------------------------------------------------
- Generated native code needs to reference various runtime
- structures/functions whose address is only known at run
- time. JITted code can simple embed the address into the native
- code, but AOT code needs to do an indirection. This
- indirection is done through a table called the Global Offset
- Table (GOT), which is similar to the GOT table in the Elf
- spec. When the runtime saves the AOT image, it saves some
- information for each method describing the GOT table entries
- used by that method. When loading a method from an AOT image,
- the runtime will fill out the GOT entries needed by the
- method.
- * Computing the address of the GOT
- Methods which need to access the GOT first need to compute its
- address. On the x86 it is done by code like this:
- call <IP + 5>
- pop ebx
- add <OFFSET TO GOT>, ebx
- <save got addr to a register>
- The variable representing the got is stored in
- cfg->got_var. It is allways allocated to a global register to
- prevent some problems with branches + basic blocks.
- * Referencing GOT entries
- Any time the native code needs to access some other runtime
- structure/function (i.e. any time the backend calls
- mono_add_patch_info ()), the code pointed by the patch needs
- to load the value from the got. For example, instead of:
- call <ABSOLUTE ADDR>
- it needs to do:
- call *<OFFSET>(<GOT REG>)
- Here, the <OFFSET> can be 0, it will be fixed up by the AOT compiler.
-
- For more examples on the changes required, see
-
- svn diff -r 37739:38213 mini-x86.c
- * The Program Linkage Table
- As in ELF, calls made from AOT code do not go through the GOT. Instead, a direct call is
- made to an entry in the Program Linkage Table (PLT). This is based on the fact that on
- most architectures, call instructions use a displacement instead of an absolute address, so
- they are already position independent. An PLT entry is usually a jump instruction, which
- initially points to some trampoline code which transfers control to the AOT loader, which
- will compile the called method, and patch the PLT entry so that further calls are made
- directly to the called method.
- If the called method is in the same assembly, and does not need initialization (i.e. it
- doesn't have GOT slots etc), then the call is made directly, bypassing the PLT.
- * The Precompiled File Format
- -----------------------------
-
- We use the native object format of the platform. That way it
- is possible to reuse existing tools like objdump and the
- dynamic loader. All we need is a working assembler, i.e. we
- write out a text file which is then passed to gas (the gnu
- assembler) to generate the object file.
-
- The precompiled image is stored in a file next to the original
- assembly that is precompiled with the native extension for a shared
- library (on Linux its ".so" to the generated file).
- For example: basic.exe -> basic.exe.so; corlib.dll -> corlib.dll.so
-
- The following things are saved in the object file and can be
- looked up using the equivalent to dlsym:
-
- mono_assembly_guid
-
- A copy of the assembly GUID.
-
- mono_aot_version
-
- The format of the AOT file format.
-
- mono_aot_opt_flags
-
- The optimizations flags used to build this
- precompiled image.
-
- method_infos
- Contains additional information needed by the runtime for using the
- precompiled method, like the GOT entries it uses.
- method_info_offsets
- Maps method indexes to offsets in the method_infos array.
-
- mono_icall_table
-
- A table that lists all the internal calls
- references by the precompiled image.
-
- mono_image_table
-
- A list of assemblies referenced by this AOT
- module.
- methods
-
- The precompiled code itself.
-
- method_offsets
-
- Maps method indexes to offsets in the methods array.
- ex_info
- Contains information about methods which is rarely used during normal execution,
- like exception and debug info.
- ex_info_offsets
- Maps method indexes to offsets in the ex_info array.
- class_info
- Contains precomputed metadata used to speed up various runtime functions.
- class_info_offsets
- Maps class indexes to offsets in the class_info array.
- class_name_table
- A hash table mapping class names to class indexes. Used to speed up
- mono_class_from_name ().
- plt
- The Program Linkage Table
- plt_info
- Contains information needed to find the method belonging to a given PLT entry.
-
- * Performance considerations
- ----------------------------
- Using AOT code is a trade-off which might lead to higher or
- slower performance, depending on a lot of circumstances. Some
- of these are:
-
- - AOT code needs to be loaded from disk before being used, so
- cold startup of an application using AOT code MIGHT be
- slower than using JITed code. Warm startup (when the code is
- already in the machines cache) should be faster. Also,
- JITing code takes time, and the JIT compiler also need to
- load additional metadata for the method from the disk, so
- startup can be faster even in the cold startup case.
- - AOT code is usually compiled with all optimizations turned
- on, while JITted code is usually compiled with default
- optimizations, so the generated code in the AOT case should
- be faster.
- - JITted code can directly access runtime data structures and
- helper functions, while AOT code needs to go through an
- indirection (the GOT) to access them, so it will be slower
- and somewhat bigger as well.
- - When JITting code, the JIT compiler needs to load a lot of
- metadata about methods and types into memory.
- - JITted code has better locality, meaning that if A method
- calls B, then the native code for A and B is usually quite
- close in memory, leading to better cache behaviour thus
- improved performance. In contrast, the native code of
- methods inside the AOT file is in a somewhat random order.
-
- * Future Work
- -------------
- - Currently, when an AOT module is loaded, all of its
- dependent assemblies are also loaded eagerly, and these
- assemblies need to be exactly the same as the ones loaded
- when the AOT module was created ('hard binding'). Non-hard
- binding should be allowed.
- - On x86, the generated code uses call 0, pop REG, add
- GOTOFFSET, REG to materialize the GOT address. Newer
- versions of gcc use a separate function to do this, maybe we
- need to do the same.
- - Currently, we get vtable addresses from the GOT. Another
- solution would be to store the data from the vtables in the
- .bss section, so accessing them would involve less
- indirection.
-
|