2
0

DriverInternals.rst 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403
  1. =========================
  2. Driver Design & Internals
  3. =========================
  4. .. contents::
  5. :local:
  6. Introduction
  7. ============
  8. NOTE: this document applies to the original Clang project, not the DirectX
  9. Compiler. It's made available for informational purposes only.
  10. This document describes the Clang driver. The purpose of this document
  11. is to describe both the motivation and design goals for the driver, as
  12. well as details of the internal implementation.
  13. Features and Goals
  14. ==================
  15. The Clang driver is intended to be a production quality compiler driver
  16. providing access to the Clang compiler and tools, with a command line
  17. interface which is compatible with the gcc driver.
  18. Although the driver is part of and driven by the Clang project, it is
  19. logically a separate tool which shares many of the same goals as Clang:
  20. .. contents:: Features
  21. :local:
  22. GCC Compatibility
  23. -----------------
  24. The number one goal of the driver is to ease the adoption of Clang by
  25. allowing users to drop Clang into a build system which was designed to
  26. call GCC. Although this makes the driver much more complicated than
  27. might otherwise be necessary, we decided that being very compatible with
  28. the gcc command line interface was worth it in order to allow users to
  29. quickly test clang on their projects.
  30. Flexible
  31. --------
  32. The driver was designed to be flexible and easily accommodate new uses
  33. as we grow the clang and LLVM infrastructure. As one example, the driver
  34. can easily support the introduction of tools which have an integrated
  35. assembler; something we hope to add to LLVM in the future.
  36. Similarly, most of the driver functionality is kept in a library which
  37. can be used to build other tools which want to implement or accept a gcc
  38. like interface.
  39. Low Overhead
  40. ------------
  41. The driver should have as little overhead as possible. In practice, we
  42. found that the gcc driver by itself incurred a small but meaningful
  43. overhead when compiling many small files. The driver doesn't do much
  44. work compared to a compilation, but we have tried to keep it as
  45. efficient as possible by following a few simple principles:
  46. - Avoid memory allocation and string copying when possible.
  47. - Don't parse arguments more than once.
  48. - Provide a few simple interfaces for efficiently searching arguments.
  49. Simple
  50. ------
  51. Finally, the driver was designed to be "as simple as possible", given
  52. the other goals. Notably, trying to be completely compatible with the
  53. gcc driver adds a significant amount of complexity. However, the design
  54. of the driver attempts to mitigate this complexity by dividing the
  55. process into a number of independent stages instead of a single
  56. monolithic task.
  57. Internal Design and Implementation
  58. ==================================
  59. .. contents::
  60. :local:
  61. :depth: 1
  62. Internals Introduction
  63. ----------------------
  64. In order to satisfy the stated goals, the driver was designed to
  65. completely subsume the functionality of the gcc executable; that is, the
  66. driver should not need to delegate to gcc to perform subtasks. On
  67. Darwin, this implies that the Clang driver also subsumes the gcc
  68. driver-driver, which is used to implement support for building universal
  69. images (binaries and object files). This also implies that the driver
  70. should be able to call the language specific compilers (e.g. cc1)
  71. directly, which means that it must have enough information to forward
  72. command line arguments to child processes correctly.
  73. Design Overview
  74. ---------------
  75. The diagram below shows the significant components of the driver
  76. architecture and how they relate to one another. The orange components
  77. represent concrete data structures built by the driver, the green
  78. components indicate conceptually distinct stages which manipulate these
  79. data structures, and the blue components are important helper classes.
  80. .. image:: DriverArchitecture.png
  81. :align: center
  82. :alt: Driver Architecture Diagram
  83. Driver Stages
  84. -------------
  85. The driver functionality is conceptually divided into five stages:
  86. #. **Parse: Option Parsing**
  87. The command line argument strings are decomposed into arguments
  88. (``Arg`` instances). The driver expects to understand all available
  89. options, although there is some facility for just passing certain
  90. classes of options through (like ``-Wl,``).
  91. Each argument corresponds to exactly one abstract ``Option``
  92. definition, which describes how the option is parsed along with some
  93. additional metadata. The Arg instances themselves are lightweight and
  94. merely contain enough information for clients to determine which
  95. option they correspond to and their values (if they have additional
  96. parameters).
  97. For example, a command line like "-Ifoo -I foo" would parse to two
  98. Arg instances (a JoinedArg and a SeparateArg instance), but each
  99. would refer to the same Option.
  100. Options are lazily created in order to avoid populating all Option
  101. classes when the driver is loaded. Most of the driver code only needs
  102. to deal with options by their unique ID (e.g., ``options::OPT_I``),
  103. Arg instances themselves do not generally store the values of
  104. parameters. In many cases, this would simply result in creating
  105. unnecessary string copies. Instead, Arg instances are always embedded
  106. inside an ArgList structure, which contains the original vector of
  107. argument strings. Each Arg itself only needs to contain an index into
  108. this vector instead of storing its values directly.
  109. The clang driver can dump the results of this stage using the
  110. ``-###`` flag (which must precede any actual command
  111. line arguments). For example:
  112. .. code-block:: console
  113. $ clang -### -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
  114. Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
  115. Option 1 - Name: "-Wa,", Values: {"-fast"}
  116. Option 2 - Name: "-I", Values: {"foo"}
  117. Option 3 - Name: "-I", Values: {"foo"}
  118. Option 4 - Name: "<input>", Values: {"t.c"}
  119. After this stage is complete the command line should be broken down
  120. into well defined option objects with their appropriate parameters.
  121. Subsequent stages should rarely, if ever, need to do any string
  122. processing.
  123. #. **Pipeline: Compilation Action Construction**
  124. Once the arguments are parsed, the tree of subprocess jobs needed for
  125. the desired compilation sequence are constructed. This involves
  126. determining the input files and their types, what work is to be done
  127. on them (preprocess, compile, assemble, link, etc.), and constructing
  128. a list of Action instances for each task. The result is a list of one
  129. or more top-level actions, each of which generally corresponds to a
  130. single output (for example, an object or linked executable).
  131. The majority of Actions correspond to actual tasks, however there are
  132. two special Actions. The first is InputAction, which simply serves to
  133. adapt an input argument for use as an input to other Actions. The
  134. second is BindArchAction, which conceptually alters the architecture
  135. to be used for all of its input Actions.
  136. The clang driver can dump the results of this stage using the
  137. ``-ccc-print-phases`` flag. For example:
  138. .. code-block:: console
  139. $ clang -ccc-print-phases -x c t.c -x assembler t.s
  140. 0: input, "t.c", c
  141. 1: preprocessor, {0}, cpp-output
  142. 2: compiler, {1}, assembler
  143. 3: assembler, {2}, object
  144. 4: input, "t.s", assembler
  145. 5: assembler, {4}, object
  146. 6: linker, {3, 5}, image
  147. Here the driver is constructing seven distinct actions, four to
  148. compile the "t.c" input into an object file, two to assemble the
  149. "t.s" input, and one to link them together.
  150. A rather different compilation pipeline is shown here; in this
  151. example there are two top level actions to compile the input files
  152. into two separate object files, where each object file is built using
  153. ``lipo`` to merge results built for two separate architectures.
  154. .. code-block:: console
  155. $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
  156. 0: input, "t0.c", c
  157. 1: preprocessor, {0}, cpp-output
  158. 2: compiler, {1}, assembler
  159. 3: assembler, {2}, object
  160. 4: bind-arch, "i386", {3}, object
  161. 5: bind-arch, "x86_64", {3}, object
  162. 6: lipo, {4, 5}, object
  163. 7: input, "t1.c", c
  164. 8: preprocessor, {7}, cpp-output
  165. 9: compiler, {8}, assembler
  166. 10: assembler, {9}, object
  167. 11: bind-arch, "i386", {10}, object
  168. 12: bind-arch, "x86_64", {10}, object
  169. 13: lipo, {11, 12}, object
  170. After this stage is complete the compilation process is divided into
  171. a simple set of actions which need to be performed to produce
  172. intermediate or final outputs (in some cases, like ``-fsyntax-only``,
  173. there is no "real" final output). Phases are well known compilation
  174. steps, such as "preprocess", "compile", "assemble", "link", etc.
  175. #. **Bind: Tool & Filename Selection**
  176. This stage (in conjunction with the Translate stage) turns the tree
  177. of Actions into a list of actual subprocess to run. Conceptually, the
  178. driver performs a top down matching to assign Action(s) to Tools. The
  179. ToolChain is responsible for selecting the tool to perform a
  180. particular action; once selected the driver interacts with the tool
  181. to see if it can match additional actions (for example, by having an
  182. integrated preprocessor).
  183. Once Tools have been selected for all actions, the driver determines
  184. how the tools should be connected (for example, using an inprocess
  185. module, pipes, temporary files, or user provided filenames). If an
  186. output file is required, the driver also computes the appropriate
  187. file name (the suffix and file location depend on the input types and
  188. options such as ``-save-temps``).
  189. The driver interacts with a ToolChain to perform the Tool bindings.
  190. Each ToolChain contains information about all the tools needed for
  191. compilation for a particular architecture, platform, and operating
  192. system. A single driver invocation may query multiple ToolChains
  193. during one compilation in order to interact with tools for separate
  194. architectures.
  195. The results of this stage are not computed directly, but the driver
  196. can print the results via the ``-ccc-print-bindings`` option. For
  197. example:
  198. .. code-block:: console
  199. $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
  200. # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
  201. # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
  202. # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
  203. # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
  204. # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
  205. # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
  206. # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
  207. This shows the tool chain, tool, inputs and outputs which have been
  208. bound for this compilation sequence. Here clang is being used to
  209. compile t0.c on the i386 architecture and darwin specific versions of
  210. the tools are being used to assemble and link the result, but generic
  211. gcc versions of the tools are being used on PowerPC.
  212. #. **Translate: Tool Specific Argument Translation**
  213. Once a Tool has been selected to perform a particular Action, the
  214. Tool must construct concrete Commands which will be executed during
  215. compilation. The main work is in translating from the gcc style
  216. command line options to whatever options the subprocess expects.
  217. Some tools, such as the assembler, only interact with a handful of
  218. arguments and just determine the path of the executable to call and
  219. pass on their input and output arguments. Others, like the compiler
  220. or the linker, may translate a large number of arguments in addition.
  221. The ArgList class provides a number of simple helper methods to
  222. assist with translating arguments; for example, to pass on only the
  223. last of arguments corresponding to some option, or all arguments for
  224. an option.
  225. The result of this stage is a list of Commands (executable paths and
  226. argument strings) to execute.
  227. #. **Execute**
  228. Finally, the compilation pipeline is executed. This is mostly
  229. straightforward, although there is some interaction with options like
  230. ``-pipe``, ``-pass-exit-codes`` and ``-time``.
  231. Additional Notes
  232. ----------------
  233. The Compilation Object
  234. ^^^^^^^^^^^^^^^^^^^^^^
  235. The driver constructs a Compilation object for each set of command line
  236. arguments. The Driver itself is intended to be invariant during
  237. construction of a Compilation; an IDE should be able to construct a
  238. single long lived driver instance to use for an entire build, for
  239. example.
  240. The Compilation object holds information that is particular to each
  241. compilation sequence. For example, the list of used temporary files
  242. (which must be removed once compilation is finished) and result files
  243. (which should be removed if compilation fails).
  244. Unified Parsing & Pipelining
  245. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  246. Parsing and pipelining both occur without reference to a Compilation
  247. instance. This is by design; the driver expects that both of these
  248. phases are platform neutral, with a few very well defined exceptions
  249. such as whether the platform uses a driver driver.
  250. ToolChain Argument Translation
  251. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  252. In order to match gcc very closely, the clang driver currently allows
  253. tool chains to perform their own translation of the argument list (into
  254. a new ArgList data structure). Although this allows the clang driver to
  255. match gcc easily, it also makes the driver operation much harder to
  256. understand (since the Tools stop seeing some arguments the user
  257. provided, and see new ones instead).
  258. For example, on Darwin ``-gfull`` gets translated into two separate
  259. arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to
  260. write Tool logic to do something with ``-gfull`` will not work, because
  261. Tool argument translation is done after the arguments have been
  262. translated.
  263. A long term goal is to remove this tool chain specific translation, and
  264. instead force each tool to change its own logic to do the right thing on
  265. the untranslated original arguments.
  266. Unused Argument Warnings
  267. ^^^^^^^^^^^^^^^^^^^^^^^^
  268. The driver operates by parsing all arguments but giving Tools the
  269. opportunity to choose which arguments to pass on. One downside of this
  270. infrastructure is that if the user misspells some option, or is confused
  271. about which options to use, some command line arguments the user really
  272. cared about may go unused. This problem is particularly important when
  273. using clang as a compiler, since the clang compiler does not support
  274. anywhere near all the options that gcc does, and we want to make sure
  275. users know which ones are being used.
  276. To support this, the driver maintains a bit associated with each
  277. argument of whether it has been used (at all) during the compilation.
  278. This bit usually doesn't need to be set by hand, as the key ArgList
  279. accessors will set it automatically.
  280. When a compilation is successful (there are no errors), the driver
  281. checks the bit and emits an "unused argument" warning for any arguments
  282. which were never accessed. This is conservative (the argument may not
  283. have been used to do what the user wanted) but still catches the most
  284. obvious cases.
  285. Relation to GCC Driver Concepts
  286. -------------------------------
  287. For those familiar with the gcc driver, this section provides a brief
  288. overview of how things from the gcc driver map to the clang driver.
  289. - **Driver Driver**
  290. The driver driver is fully integrated into the clang driver. The
  291. driver simply constructs additional Actions to bind the architecture
  292. during the *Pipeline* phase. The tool chain specific argument
  293. translation is responsible for handling ``-Xarch_``.
  294. The one caveat is that this approach requires ``-Xarch_`` not be used
  295. to alter the compilation itself (for example, one cannot provide
  296. ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject
  297. such invocations, and overall there isn't a good reason to abuse
  298. ``-Xarch_`` to that end in practice.
  299. The upside is that the clang driver is more efficient and does little
  300. extra work to support universal builds. It also provides better error
  301. reporting and UI consistency.
  302. - **Specs**
  303. The clang driver has no direct correspondent for "specs". The
  304. majority of the functionality that is embedded in specs is in the
  305. Tool specific argument translation routines. The parts of specs which
  306. control the compilation pipeline are generally part of the *Pipeline*
  307. stage.
  308. - **Toolchains**
  309. The gcc driver has no direct understanding of tool chains. Each gcc
  310. binary roughly corresponds to the information which is embedded
  311. inside a single ToolChain.
  312. The clang driver is intended to be portable and support complex
  313. compilation environments. All platform and tool chain specific code
  314. should be protected behind either abstract or well defined interfaces
  315. (such as whether the platform supports use as a driver driver).