123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403 |
- =========================
- Driver Design & Internals
- =========================
- .. contents::
- :local:
- Introduction
- ============
- NOTE: this document applies to the original Clang project, not the DirectX
- Compiler. It's made available for informational purposes only.
- This document describes the Clang driver. The purpose of this document
- is to describe both the motivation and design goals for the driver, as
- well as details of the internal implementation.
- Features and Goals
- ==================
- The Clang driver is intended to be a production quality compiler driver
- providing access to the Clang compiler and tools, with a command line
- interface which is compatible with the gcc driver.
- Although the driver is part of and driven by the Clang project, it is
- logically a separate tool which shares many of the same goals as Clang:
- .. contents:: Features
- :local:
- GCC Compatibility
- -----------------
- The number one goal of the driver is to ease the adoption of Clang by
- allowing users to drop Clang into a build system which was designed to
- call GCC. Although this makes the driver much more complicated than
- might otherwise be necessary, we decided that being very compatible with
- the gcc command line interface was worth it in order to allow users to
- quickly test clang on their projects.
- Flexible
- --------
- The driver was designed to be flexible and easily accommodate new uses
- as we grow the clang and LLVM infrastructure. As one example, the driver
- can easily support the introduction of tools which have an integrated
- assembler; something we hope to add to LLVM in the future.
- Similarly, most of the driver functionality is kept in a library which
- can be used to build other tools which want to implement or accept a gcc
- like interface.
- Low Overhead
- ------------
- The driver should have as little overhead as possible. In practice, we
- found that the gcc driver by itself incurred a small but meaningful
- overhead when compiling many small files. The driver doesn't do much
- work compared to a compilation, but we have tried to keep it as
- efficient as possible by following a few simple principles:
- - Avoid memory allocation and string copying when possible.
- - Don't parse arguments more than once.
- - Provide a few simple interfaces for efficiently searching arguments.
- Simple
- ------
- Finally, the driver was designed to be "as simple as possible", given
- the other goals. Notably, trying to be completely compatible with the
- gcc driver adds a significant amount of complexity. However, the design
- of the driver attempts to mitigate this complexity by dividing the
- process into a number of independent stages instead of a single
- monolithic task.
- Internal Design and Implementation
- ==================================
- .. contents::
- :local:
- :depth: 1
- Internals Introduction
- ----------------------
- In order to satisfy the stated goals, the driver was designed to
- completely subsume the functionality of the gcc executable; that is, the
- driver should not need to delegate to gcc to perform subtasks. On
- Darwin, this implies that the Clang driver also subsumes the gcc
- driver-driver, which is used to implement support for building universal
- images (binaries and object files). This also implies that the driver
- should be able to call the language specific compilers (e.g. cc1)
- directly, which means that it must have enough information to forward
- command line arguments to child processes correctly.
- Design Overview
- ---------------
- The diagram below shows the significant components of the driver
- architecture and how they relate to one another. The orange components
- represent concrete data structures built by the driver, the green
- components indicate conceptually distinct stages which manipulate these
- data structures, and the blue components are important helper classes.
- .. image:: DriverArchitecture.png
- :align: center
- :alt: Driver Architecture Diagram
- Driver Stages
- -------------
- The driver functionality is conceptually divided into five stages:
- #. **Parse: Option Parsing**
- The command line argument strings are decomposed into arguments
- (``Arg`` instances). The driver expects to understand all available
- options, although there is some facility for just passing certain
- classes of options through (like ``-Wl,``).
- Each argument corresponds to exactly one abstract ``Option``
- definition, which describes how the option is parsed along with some
- additional metadata. The Arg instances themselves are lightweight and
- merely contain enough information for clients to determine which
- option they correspond to and their values (if they have additional
- parameters).
- For example, a command line like "-Ifoo -I foo" would parse to two
- Arg instances (a JoinedArg and a SeparateArg instance), but each
- would refer to the same Option.
- Options are lazily created in order to avoid populating all Option
- classes when the driver is loaded. Most of the driver code only needs
- to deal with options by their unique ID (e.g., ``options::OPT_I``),
- Arg instances themselves do not generally store the values of
- parameters. In many cases, this would simply result in creating
- unnecessary string copies. Instead, Arg instances are always embedded
- inside an ArgList structure, which contains the original vector of
- argument strings. Each Arg itself only needs to contain an index into
- this vector instead of storing its values directly.
- The clang driver can dump the results of this stage using the
- ``-###`` flag (which must precede any actual command
- line arguments). For example:
- .. code-block:: console
- $ clang -### -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c
- Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
- Option 1 - Name: "-Wa,", Values: {"-fast"}
- Option 2 - Name: "-I", Values: {"foo"}
- Option 3 - Name: "-I", Values: {"foo"}
- Option 4 - Name: "<input>", Values: {"t.c"}
- After this stage is complete the command line should be broken down
- into well defined option objects with their appropriate parameters.
- Subsequent stages should rarely, if ever, need to do any string
- processing.
- #. **Pipeline: Compilation Action Construction**
- Once the arguments are parsed, the tree of subprocess jobs needed for
- the desired compilation sequence are constructed. This involves
- determining the input files and their types, what work is to be done
- on them (preprocess, compile, assemble, link, etc.), and constructing
- a list of Action instances for each task. The result is a list of one
- or more top-level actions, each of which generally corresponds to a
- single output (for example, an object or linked executable).
- The majority of Actions correspond to actual tasks, however there are
- two special Actions. The first is InputAction, which simply serves to
- adapt an input argument for use as an input to other Actions. The
- second is BindArchAction, which conceptually alters the architecture
- to be used for all of its input Actions.
- The clang driver can dump the results of this stage using the
- ``-ccc-print-phases`` flag. For example:
- .. code-block:: console
- $ clang -ccc-print-phases -x c t.c -x assembler t.s
- 0: input, "t.c", c
- 1: preprocessor, {0}, cpp-output
- 2: compiler, {1}, assembler
- 3: assembler, {2}, object
- 4: input, "t.s", assembler
- 5: assembler, {4}, object
- 6: linker, {3, 5}, image
- Here the driver is constructing seven distinct actions, four to
- compile the "t.c" input into an object file, two to assemble the
- "t.s" input, and one to link them together.
- A rather different compilation pipeline is shown here; in this
- example there are two top level actions to compile the input files
- into two separate object files, where each object file is built using
- ``lipo`` to merge results built for two separate architectures.
- .. code-block:: console
- $ clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c
- 0: input, "t0.c", c
- 1: preprocessor, {0}, cpp-output
- 2: compiler, {1}, assembler
- 3: assembler, {2}, object
- 4: bind-arch, "i386", {3}, object
- 5: bind-arch, "x86_64", {3}, object
- 6: lipo, {4, 5}, object
- 7: input, "t1.c", c
- 8: preprocessor, {7}, cpp-output
- 9: compiler, {8}, assembler
- 10: assembler, {9}, object
- 11: bind-arch, "i386", {10}, object
- 12: bind-arch, "x86_64", {10}, object
- 13: lipo, {11, 12}, object
- After this stage is complete the compilation process is divided into
- a simple set of actions which need to be performed to produce
- intermediate or final outputs (in some cases, like ``-fsyntax-only``,
- there is no "real" final output). Phases are well known compilation
- steps, such as "preprocess", "compile", "assemble", "link", etc.
- #. **Bind: Tool & Filename Selection**
- This stage (in conjunction with the Translate stage) turns the tree
- of Actions into a list of actual subprocess to run. Conceptually, the
- driver performs a top down matching to assign Action(s) to Tools. The
- ToolChain is responsible for selecting the tool to perform a
- particular action; once selected the driver interacts with the tool
- to see if it can match additional actions (for example, by having an
- integrated preprocessor).
- Once Tools have been selected for all actions, the driver determines
- how the tools should be connected (for example, using an inprocess
- module, pipes, temporary files, or user provided filenames). If an
- output file is required, the driver also computes the appropriate
- file name (the suffix and file location depend on the input types and
- options such as ``-save-temps``).
- The driver interacts with a ToolChain to perform the Tool bindings.
- Each ToolChain contains information about all the tools needed for
- compilation for a particular architecture, platform, and operating
- system. A single driver invocation may query multiple ToolChains
- during one compilation in order to interact with tools for separate
- architectures.
- The results of this stage are not computed directly, but the driver
- can print the results via the ``-ccc-print-bindings`` option. For
- example:
- .. code-block:: console
- $ clang -ccc-print-bindings -arch i386 -arch ppc t0.c
- # "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
- # "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
- # "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
- # "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
- # "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
- # "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
- # "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
- This shows the tool chain, tool, inputs and outputs which have been
- bound for this compilation sequence. Here clang is being used to
- compile t0.c on the i386 architecture and darwin specific versions of
- the tools are being used to assemble and link the result, but generic
- gcc versions of the tools are being used on PowerPC.
- #. **Translate: Tool Specific Argument Translation**
- Once a Tool has been selected to perform a particular Action, the
- Tool must construct concrete Commands which will be executed during
- compilation. The main work is in translating from the gcc style
- command line options to whatever options the subprocess expects.
- Some tools, such as the assembler, only interact with a handful of
- arguments and just determine the path of the executable to call and
- pass on their input and output arguments. Others, like the compiler
- or the linker, may translate a large number of arguments in addition.
- The ArgList class provides a number of simple helper methods to
- assist with translating arguments; for example, to pass on only the
- last of arguments corresponding to some option, or all arguments for
- an option.
- The result of this stage is a list of Commands (executable paths and
- argument strings) to execute.
- #. **Execute**
- Finally, the compilation pipeline is executed. This is mostly
- straightforward, although there is some interaction with options like
- ``-pipe``, ``-pass-exit-codes`` and ``-time``.
- Additional Notes
- ----------------
- The Compilation Object
- ^^^^^^^^^^^^^^^^^^^^^^
- The driver constructs a Compilation object for each set of command line
- arguments. The Driver itself is intended to be invariant during
- construction of a Compilation; an IDE should be able to construct a
- single long lived driver instance to use for an entire build, for
- example.
- The Compilation object holds information that is particular to each
- compilation sequence. For example, the list of used temporary files
- (which must be removed once compilation is finished) and result files
- (which should be removed if compilation fails).
- Unified Parsing & Pipelining
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Parsing and pipelining both occur without reference to a Compilation
- instance. This is by design; the driver expects that both of these
- phases are platform neutral, with a few very well defined exceptions
- such as whether the platform uses a driver driver.
- ToolChain Argument Translation
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- In order to match gcc very closely, the clang driver currently allows
- tool chains to perform their own translation of the argument list (into
- a new ArgList data structure). Although this allows the clang driver to
- match gcc easily, it also makes the driver operation much harder to
- understand (since the Tools stop seeing some arguments the user
- provided, and see new ones instead).
- For example, on Darwin ``-gfull`` gets translated into two separate
- arguments, ``-g`` and ``-fno-eliminate-unused-debug-symbols``. Trying to
- write Tool logic to do something with ``-gfull`` will not work, because
- Tool argument translation is done after the arguments have been
- translated.
- A long term goal is to remove this tool chain specific translation, and
- instead force each tool to change its own logic to do the right thing on
- the untranslated original arguments.
- Unused Argument Warnings
- ^^^^^^^^^^^^^^^^^^^^^^^^
- The driver operates by parsing all arguments but giving Tools the
- opportunity to choose which arguments to pass on. One downside of this
- infrastructure is that if the user misspells some option, or is confused
- about which options to use, some command line arguments the user really
- cared about may go unused. This problem is particularly important when
- using clang as a compiler, since the clang compiler does not support
- anywhere near all the options that gcc does, and we want to make sure
- users know which ones are being used.
- To support this, the driver maintains a bit associated with each
- argument of whether it has been used (at all) during the compilation.
- This bit usually doesn't need to be set by hand, as the key ArgList
- accessors will set it automatically.
- When a compilation is successful (there are no errors), the driver
- checks the bit and emits an "unused argument" warning for any arguments
- which were never accessed. This is conservative (the argument may not
- have been used to do what the user wanted) but still catches the most
- obvious cases.
- Relation to GCC Driver Concepts
- -------------------------------
- For those familiar with the gcc driver, this section provides a brief
- overview of how things from the gcc driver map to the clang driver.
- - **Driver Driver**
- The driver driver is fully integrated into the clang driver. The
- driver simply constructs additional Actions to bind the architecture
- during the *Pipeline* phase. The tool chain specific argument
- translation is responsible for handling ``-Xarch_``.
- The one caveat is that this approach requires ``-Xarch_`` not be used
- to alter the compilation itself (for example, one cannot provide
- ``-S`` as an ``-Xarch_`` argument). The driver attempts to reject
- such invocations, and overall there isn't a good reason to abuse
- ``-Xarch_`` to that end in practice.
- The upside is that the clang driver is more efficient and does little
- extra work to support universal builds. It also provides better error
- reporting and UI consistency.
- - **Specs**
- The clang driver has no direct correspondent for "specs". The
- majority of the functionality that is embedded in specs is in the
- Tool specific argument translation routines. The parts of specs which
- control the compilation pipeline are generally part of the *Pipeline*
- stage.
- - **Toolchains**
- The gcc driver has no direct understanding of tool chains. Each gcc
- binary roughly corresponds to the information which is embedded
- inside a single ToolChain.
- The clang driver is intended to be portable and support complex
- compilation environments. All platform and tool chain specific code
- should be protected behind either abstract or well defined interfaces
- (such as whether the platform supports use as a driver driver).
|