PCHInternals.rst 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574
  1. ========================================
  2. Precompiled Header and Modules Internals
  3. ========================================
  4. .. contents::
  5. :local:
  6. NOTE: this document applies to the original Clang project, not the DirectX
  7. Compiler. It's made available for informational purposes only.
  8. This document describes the design and implementation of Clang's precompiled
  9. headers (PCH) and modules. If you are interested in the end-user view, please
  10. see the :ref:`User's Manual <usersmanual-precompiled-headers>`.
  11. Using Precompiled Headers with ``clang``
  12. ----------------------------------------
  13. The Clang compiler frontend, ``clang -cc1``, supports two command line options
  14. for generating and using PCH files.
  15. To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`:
  16. .. code-block:: bash
  17. $ clang -cc1 test.h -emit-pch -o test.h.pch
  18. This option is transparently used by ``clang`` when generating PCH files. The
  19. resulting PCH file contains the serialized form of the compiler's internal
  20. representation after it has completed parsing and semantic analysis. The PCH
  21. file can then be used as a prefix header with the :option:`-include-pch`
  22. option:
  23. .. code-block:: bash
  24. $ clang -cc1 -include-pch test.h.pch test.c -o test.s
  25. Design Philosophy
  26. -----------------
  27. Precompiled headers are meant to improve overall compile times for projects, so
  28. the design of precompiled headers is entirely driven by performance concerns.
  29. The use case for precompiled headers is relatively simple: when there is a
  30. common set of headers that is included in nearly every source file in the
  31. project, we *precompile* that bundle of headers into a single precompiled
  32. header (PCH file). Then, when compiling the source files in the project, we
  33. load the PCH file first (as a prefix header), which acts as a stand-in for that
  34. bundle of headers.
  35. A precompiled header implementation improves performance when:
  36. * Loading the PCH file is significantly faster than re-parsing the bundle of
  37. headers stored within the PCH file. Thus, a precompiled header design
  38. attempts to minimize the cost of reading the PCH file. Ideally, this cost
  39. should not vary with the size of the precompiled header file.
  40. * The cost of generating the PCH file initially is not so large that it
  41. counters the per-source-file performance improvement due to eliminating the
  42. need to parse the bundled headers in the first place. This is particularly
  43. important on multi-core systems, because PCH file generation serializes the
  44. build when all compilations require the PCH file to be up-to-date.
  45. Modules, as implemented in Clang, use the same mechanisms as precompiled
  46. headers to save a serialized AST file (one per module) and use those AST
  47. modules. From an implementation standpoint, modules are a generalization of
  48. precompiled headers, lifting a number of restrictions placed on precompiled
  49. headers. In particular, there can only be one precompiled header and it must
  50. be included at the beginning of the translation unit. The extensions to the
  51. AST file format required for modules are discussed in the section on
  52. :ref:`modules <pchinternals-modules>`.
  53. Clang's AST files are designed with a compact on-disk representation, which
  54. minimizes both creation time and the time required to initially load the AST
  55. file. The AST file itself contains a serialized representation of Clang's
  56. abstract syntax trees and supporting data structures, stored using the same
  57. compressed bitstream as `LLVM's bitcode file format
  58. <http://llvm.org/docs/BitCodeFormat.html>`_.
  59. Clang's AST files are loaded "lazily" from disk. When an AST file is initially
  60. loaded, Clang reads only a small amount of data from the AST file to establish
  61. where certain important data structures are stored. The amount of data read in
  62. this initial load is independent of the size of the AST file, such that a
  63. larger AST file does not lead to longer AST load times. The actual header data
  64. in the AST file --- macros, functions, variables, types, etc. --- is loaded
  65. only when it is referenced from the user's code, at which point only that
  66. entity (and those entities it depends on) are deserialized from the AST file.
  67. With this approach, the cost of using an AST file for a translation unit is
  68. proportional to the amount of code actually used from the AST file, rather than
  69. being proportional to the size of the AST file itself.
  70. When given the :option:`-print-stats` option, Clang produces statistics
  71. describing how much of the AST file was actually loaded from disk. For a
  72. simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
  73. (which is built as a precompiled header), this option illustrates how little of
  74. the actual precompiled header is required:
  75. .. code-block:: none
  76. *** AST File Statistics:
  77. 895/39981 source location entries read (2.238563%)
  78. 19/15315 types read (0.124061%)
  79. 20/82685 declarations read (0.024188%)
  80. 154/58070 identifiers read (0.265197%)
  81. 0/7260 selectors read (0.000000%)
  82. 0/30842 statements read (0.000000%)
  83. 4/8400 macros read (0.047619%)
  84. 1/4995 lexical declcontexts read (0.020020%)
  85. 0/4413 visible declcontexts read (0.000000%)
  86. 0/7230 method pool entries read (0.000000%)
  87. 0 method pool misses
  88. For this small program, only a tiny fraction of the source locations, types,
  89. declarations, identifiers, and macros were actually deserialized from the
  90. precompiled header. These statistics can be useful to determine whether the
  91. AST file implementation can be improved by making more of the implementation
  92. lazy.
  93. Precompiled headers can be chained. When you create a PCH while including an
  94. existing PCH, Clang can create the new PCH by referencing the original file and
  95. only writing the new data to the new file. For example, you could create a PCH
  96. out of all the headers that are very commonly used throughout your project, and
  97. then create a PCH for every single source file in the project that includes the
  98. code that is specific to that file, so that recompiling the file itself is very
  99. fast, without duplicating the data from the common headers for every file. The
  100. mechanisms behind chained precompiled headers are discussed in a :ref:`later
  101. section <pchinternals-chained>`.
  102. AST File Contents
  103. -----------------
  104. An AST file produced by clang is an object file container with a ``clangast``
  105. (COFF) or ``__clangast`` (ELF and Mach-O) section containing the serialized AST.
  106. Other target-specific sections in the object file container are used to hold
  107. debug information for the data types defined in the AST. Tools built on top of
  108. libclang that do not need debug information may also produce raw AST files that
  109. only contain the serialized AST.
  110. The ``clangast`` section is organized into several different blocks, each of
  111. which contains the serialized representation of a part of Clang's internal
  112. representation. Each of the blocks corresponds to either a block or a record
  113. within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
  114. The contents of each of these logical blocks are described below.
  115. .. image:: PCHLayout.png
  116. The ``llvm-objdump`` utility provides a ``-raw-clang-ast`` option to extract the
  117. binary contents of the AST section from an object file container.
  118. The `llvm-bcanalyzer <http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_
  119. utility can be used to examine the actual structure of the bitstream for the AST
  120. section. This information can be used both to help understand the structure of
  121. the AST section and to isolate areas where the AST representation can still be
  122. optimized, e.g., through the introduction of abbreviations.
  123. Metadata Block
  124. ^^^^^^^^^^^^^^
  125. The metadata block contains several records that provide information about how
  126. the AST file was built. This metadata is primarily used to validate the use of
  127. an AST file. For example, a precompiled header built for a 32-bit x86 target
  128. cannot be used when compiling for a 64-bit x86 target. The metadata block
  129. contains information about:
  130. Language options
  131. Describes the particular language dialect used to compile the AST file,
  132. including major options (e.g., Objective-C support) and more minor options
  133. (e.g., support for "``//``" comments). The contents of this record correspond to
  134. the ``LangOptions`` class.
  135. Target architecture
  136. The target triple that describes the architecture, platform, and ABI for
  137. which the AST file was generated, e.g., ``i386-apple-darwin9``.
  138. AST version
  139. The major and minor version numbers of the AST file format. Changes in the
  140. minor version number should not affect backward compatibility, while changes
  141. in the major version number imply that a newer compiler cannot read an older
  142. precompiled header (and vice-versa).
  143. Original file name
  144. The full path of the header that was used to generate the AST file.
  145. Predefines buffer
  146. Although not explicitly stored as part of the metadata, the predefines buffer
  147. is used in the validation of the AST file. The predefines buffer itself
  148. contains code generated by the compiler to initialize the preprocessor state
  149. according to the current target, platform, and command-line options. For
  150. example, the predefines buffer will contain "``#define __STDC__ 1``" when we
  151. are compiling C without Microsoft extensions. The predefines buffer itself
  152. is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
  153. verified along with the rest of the metadata.
  154. A chained PCH file (that is, one that references another PCH) and a module
  155. (which may import other modules) have additional metadata containing the list
  156. of all AST files that this AST file depends on. Each of those files will be
  157. loaded along with this AST file.
  158. For chained precompiled headers, the language options, target architecture and
  159. predefines buffer data is taken from the end of the chain, since they have to
  160. match anyway.
  161. .. _pchinternals-sourcemgr:
  162. Source Manager Block
  163. ^^^^^^^^^^^^^^^^^^^^
  164. The source manager block contains the serialized representation of Clang's
  165. :ref:`SourceManager <SourceManager>` class, which handles the mapping from
  166. source locations (as represented in Clang's abstract syntax tree) into actual
  167. column/line positions within a source file or macro instantiation. The AST
  168. file's representation of the source manager also includes information about all
  169. of the headers that were (transitively) included when building the AST file.
  170. The bulk of the source manager block is dedicated to information about the
  171. various files, buffers, and macro instantiations into which a source location
  172. can refer. Each of these is referenced by a numeric "file ID", which is a
  173. unique number (allocated starting at 1) stored in the source location. Clang
  174. serializes the information for each kind of file ID, along with an index that
  175. maps file IDs to the position within the AST file where the information about
  176. that file ID is stored. The data associated with a file ID is loaded only when
  177. required by the front end, e.g., to emit a diagnostic that includes a macro
  178. instantiation history inside the header itself.
  179. The source manager block also contains information about all of the headers
  180. that were included when building the AST file. This includes information about
  181. the controlling macro for the header (e.g., when the preprocessor identified
  182. that the contents of the header dependent on a macro like
  183. ``LLVM_CLANG_SOURCEMANAGER_H``).
  184. .. _pchinternals-preprocessor:
  185. Preprocessor Block
  186. ^^^^^^^^^^^^^^^^^^
  187. The preprocessor block contains the serialized representation of the
  188. preprocessor. Specifically, it contains all of the macros that have been
  189. defined by the end of the header used to build the AST file, along with the
  190. token sequences that comprise each macro. The macro definitions are only read
  191. from the AST file when the name of the macro first occurs in the program. This
  192. lazy loading of macro definitions is triggered by lookups into the
  193. :ref:`identifier table <pchinternals-ident-table>`.
  194. .. _pchinternals-types:
  195. Types Block
  196. ^^^^^^^^^^^
  197. The types block contains the serialized representation of all of the types
  198. referenced in the translation unit. Each Clang type node (``PointerType``,
  199. ``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
  200. When types are deserialized from the AST file, the data within the record is
  201. used to reconstruct the appropriate type node using the AST context.
  202. Each type has a unique type ID, which is an integer that uniquely identifies
  203. that type. Type ID 0 represents the NULL type, type IDs less than
  204. ``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
  205. while other "user-defined" type IDs are assigned consecutively from
  206. ``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has
  207. an associated mapping from the user-defined types block to the location within
  208. the types block where the serialized representation of that type resides,
  209. enabling lazy deserialization of types. When a type is referenced from within
  210. the AST file, that reference is encoded using the type ID shifted left by 3
  211. bits. The lower three bits are used to represent the ``const``, ``volatile``,
  212. and ``restrict`` qualifiers, as in Clang's :ref:`QualType <QualType>` class.
  213. .. _pchinternals-decls:
  214. Declarations Block
  215. ^^^^^^^^^^^^^^^^^^
  216. The declarations block contains the serialized representation of all of the
  217. declarations referenced in the translation unit. Each Clang declaration node
  218. (``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
  219. AST file. When declarations are deserialized from the AST file, the data
  220. within the record is used to build and populate a new instance of the
  221. corresponding ``Decl`` node. As with types, each declaration node has a
  222. numeric ID that is used to refer to that declaration within the AST file. In
  223. addition, a lookup table provides a mapping from that numeric ID to the offset
  224. within the precompiled header where that declaration is described.
  225. Declarations in Clang's abstract syntax trees are stored hierarchically. At
  226. the top of the hierarchy is the translation unit (``TranslationUnitDecl``),
  227. which contains all of the declarations in the translation unit but is not
  228. actually written as a specific declaration node. Its child declarations (such
  229. as functions or struct types) may also contain other declarations inside them,
  230. and so on. Within Clang, each declaration is stored within a :ref:`declaration
  231. context <DeclContext>`, as represented by the ``DeclContext`` class.
  232. Declaration contexts provide the mechanism to perform name lookup within a
  233. given declaration (e.g., find the member named ``x`` in a structure) and
  234. iterate over the declarations stored within a context (e.g., iterate over all
  235. of the fields of a structure for structure layout).
  236. In Clang's AST file format, deserializing a declaration that is a
  237. ``DeclContext`` is a separate operation from deserializing all of the
  238. declarations stored within that declaration context. Therefore, Clang will
  239. deserialize the translation unit declaration without deserializing the
  240. declarations within that translation unit. When required, the declarations
  241. stored within a declaration context will be deserialized. There are two
  242. representations of the declarations within a declaration context, which
  243. correspond to the name-lookup and iteration behavior described above:
  244. * When the front end performs name lookup to find a name ``x`` within a given
  245. declaration context (for example, during semantic analysis of the expression
  246. ``p->x``, where ``p``'s type is defined in the precompiled header), Clang
  247. refers to an on-disk hash table that maps from the names within that
  248. declaration context to the declaration IDs that represent each visible
  249. declaration with that name. The actual declarations will then be
  250. deserialized to provide the results of name lookup.
  251. * When the front end performs iteration over all of the declarations within a
  252. declaration context, all of those declarations are immediately
  253. de-serialized. For large declaration contexts (e.g., the translation unit),
  254. this operation is expensive; however, large declaration contexts are not
  255. traversed in normal compilation, since such a traversal is unnecessary.
  256. However, it is common for the code generator and semantic analysis to
  257. traverse declaration contexts for structs, classes, unions, and
  258. enumerations, although those contexts contain relatively few declarations in
  259. the common case.
  260. Statements and Expressions
  261. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  262. Statements and expressions are stored in the AST file in both the :ref:`types
  263. <pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
  264. because every statement or expression will be associated with either a type or
  265. declaration. The actual statement and expression records are stored
  266. immediately following the declaration or type that owns the statement or
  267. expression. For example, the statement representing the body of a function
  268. will be stored directly following the declaration of the function.
  269. As with types and declarations, each statement and expression kind in Clang's
  270. abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
  271. record type in the AST file, which contains the serialized representation of
  272. that statement or expression. Each substatement or subexpression within an
  273. expression is stored as a separate record (which keeps most records to a fixed
  274. size). Within the AST file, the subexpressions of an expression are stored, in
  275. reverse order, prior to the expression that owns those expression, using a form
  276. of `Reverse Polish Notation
  277. <http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an
  278. expression ``3 - 4 + 5`` would be represented as follows:
  279. +-----------------------+
  280. | ``IntegerLiteral(5)`` |
  281. +-----------------------+
  282. | ``IntegerLiteral(4)`` |
  283. +-----------------------+
  284. | ``IntegerLiteral(3)`` |
  285. +-----------------------+
  286. | ``IntegerLiteral(-)`` |
  287. +-----------------------+
  288. | ``IntegerLiteral(+)`` |
  289. +-----------------------+
  290. | ``STOP`` |
  291. +-----------------------+
  292. When reading this representation, Clang evaluates each expression record it
  293. encounters, builds the appropriate abstract syntax tree node, and then pushes
  294. that expression on to a stack. When a record contains *N* subexpressions ---
  295. ``BinaryOperator`` has two of them --- those expressions are popped from the
  296. top of the stack. The special STOP code indicates that we have reached the end
  297. of a serialized expression or statement; other expression or statement records
  298. may follow, but they are part of a different expression.
  299. .. _pchinternals-ident-table:
  300. Identifier Table Block
  301. ^^^^^^^^^^^^^^^^^^^^^^
  302. The identifier table block contains an on-disk hash table that maps each
  303. identifier mentioned within the AST file to the serialized representation of
  304. the identifier's information (e.g, the ``IdentifierInfo`` structure). The
  305. serialized representation contains:
  306. * The actual identifier string.
  307. * Flags that describe whether this identifier is the name of a built-in, a
  308. poisoned identifier, an extension token, or a macro.
  309. * If the identifier names a macro, the offset of the macro definition within
  310. the :ref:`pchinternals-preprocessor`.
  311. * If the identifier names one or more declarations visible from translation
  312. unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
  313. declarations.
  314. When an AST file is loaded, the AST file reader mechanism introduces itself
  315. into the identifier table as an external lookup source. Thus, when the user
  316. program refers to an identifier that has not yet been seen, Clang will perform
  317. a lookup into the identifier table. If an identifier is found, its contents
  318. (macro definitions, flags, top-level declarations, etc.) will be deserialized,
  319. at which point the corresponding ``IdentifierInfo`` structure will have the
  320. same contents it would have after parsing the headers in the AST file.
  321. Within the AST file, the identifiers used to name declarations are represented
  322. with an integral value. A separate table provides a mapping from this integral
  323. value (the identifier ID) to the location within the on-disk hash table where
  324. that identifier is stored. This mapping is used when deserializing the name of
  325. a declaration, the identifier of a token, or any other construct in the AST
  326. file that refers to a name.
  327. .. _pchinternals-method-pool:
  328. Method Pool Block
  329. ^^^^^^^^^^^^^^^^^
  330. The method pool block is represented as an on-disk hash table that serves two
  331. purposes: it provides a mapping from the names of Objective-C selectors to the
  332. set of Objective-C instance and class methods that have that particular
  333. selector (which is required for semantic analysis in Objective-C) and also
  334. stores all of the selectors used by entities within the AST file. The design
  335. of the method pool is similar to that of the :ref:`identifier table
  336. <pchinternals-ident-table>`: the first time a particular selector is formed
  337. during the compilation of the program, Clang will search in the on-disk hash
  338. table of selectors; if found, Clang will read the Objective-C methods
  339. associated with that selector into the appropriate front-end data structure
  340. (``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
  341. class methods, respectively).
  342. As with identifiers, selectors are represented by numeric values within the AST
  343. file. A separate index maps these numeric selector values to the offset of the
  344. selector within the on-disk hash table, and will be used when de-serializing an
  345. Objective-C method declaration (or other Objective-C construct) that refers to
  346. the selector.
  347. AST Reader Integration Points
  348. -----------------------------
  349. The "lazy" deserialization behavior of AST files requires their integration
  350. into several completely different submodules of Clang. For example, lazily
  351. deserializing the declarations during name lookup requires that the name-lookup
  352. routines be able to query the AST file to find entities stored there.
  353. For each Clang data structure that requires direct interaction with the AST
  354. reader logic, there is an abstract class that provides the interface between
  355. the two modules. The ``ASTReader`` class, which handles the loading of an AST
  356. file, inherits from all of these abstract classes to provide lazy
  357. deserialization of Clang's data structures. ``ASTReader`` implements the
  358. following abstract classes:
  359. ``ExternalSLocEntrySource``
  360. This abstract interface is associated with the ``SourceManager`` class, and
  361. is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
  362. load the details of a file, buffer, or macro instantiation.
  363. ``IdentifierInfoLookup``
  364. This abstract interface is associated with the ``IdentifierTable`` class, and
  365. is used whenever the program source refers to an identifier that has not yet
  366. been seen. In this case, the AST reader searches for this identifier within
  367. its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
  368. declarations or macros associated with that identifier.
  369. ``ExternalASTSource``
  370. This abstract interface is associated with the ``ASTContext`` class, and is
  371. used whenever the abstract syntax tree nodes need to loaded from the AST
  372. file. It provides the ability to de-serialize declarations and types
  373. identified by their numeric values, read the bodies of functions when
  374. required, and read the declarations stored within a declaration context
  375. (either for iteration or for name lookup).
  376. ``ExternalSemaSource``
  377. This abstract interface is associated with the ``Sema`` class, and is used
  378. whenever semantic analysis needs to read information from the :ref:`global
  379. method pool <pchinternals-method-pool>`.
  380. .. _pchinternals-chained:
  381. Chained precompiled headers
  382. ---------------------------
  383. Chained precompiled headers were initially intended to improve the performance
  384. of IDE-centric operations such as syntax highlighting and code completion while
  385. a particular source file is being edited by the user. To minimize the amount
  386. of reparsing required after a change to the file, a form of precompiled header
  387. --- called a precompiled *preamble* --- is automatically generated by parsing
  388. all of the headers in the source file, up to and including the last
  389. ``#include``. When only the source file changes (and none of the headers it
  390. depends on), reparsing of that source file can use the precompiled preamble and
  391. start parsing after the ``#include``\ s, so parsing time is proportional to the
  392. size of the source file (rather than all of its includes). However, the
  393. compilation of that translation unit may already use a precompiled header: in
  394. this case, Clang will create the precompiled preamble as a chained precompiled
  395. header that refers to the original precompiled header. This drastically
  396. reduces the time needed to serialize the precompiled preamble for use in
  397. reparsing.
  398. Chained precompiled headers get their name because each precompiled header can
  399. depend on one other precompiled header, forming a chain of dependencies. A
  400. translation unit will then include the precompiled header that starts the chain
  401. (i.e., nothing depends on it). This linearity of dependencies is important for
  402. the semantic model of chained precompiled headers, because the most-recent
  403. precompiled header can provide information that overrides the information
  404. provided by the precompiled headers it depends on, just like a header file
  405. ``B.h`` that includes another header ``A.h`` can modify the state produced by
  406. parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
  407. There are several ways in which chained precompiled headers generalize the AST
  408. file model:
  409. Numbering of IDs
  410. Many different kinds of entities --- identifiers, declarations, types, etc.
  411. --- have ID numbers that start at 1 or some other predefined constant and
  412. grow upward. Each precompiled header records the maximum ID number it has
  413. assigned in each category. Then, when a new precompiled header is generated
  414. that depends on (chains to) another precompiled header, it will start
  415. counting at the next available ID number. This way, one can determine, given
  416. an ID number, which AST file actually contains the entity.
  417. Name lookup
  418. When writing a chained precompiled header, Clang attempts to write only
  419. information that has changed from the precompiled header on which it is
  420. based. This changes the lookup algorithm for the various tables, such as the
  421. :ref:`identifier table <pchinternals-ident-table>`: the search starts at the
  422. most-recent precompiled header. If no entry is found, lookup then proceeds
  423. to the identifier table in the precompiled header it depends on, and so one.
  424. Once a lookup succeeds, that result is considered definitive, overriding any
  425. results from earlier precompiled headers.
  426. Update records
  427. There are various ways in which a later precompiled header can modify the
  428. entities described in an earlier precompiled header. For example, later
  429. precompiled headers can add entries into the various name-lookup tables for
  430. the translation unit or namespaces, or add new categories to an Objective-C
  431. class. Each of these updates is captured in an "update record" that is
  432. stored in the chained precompiled header file and will be loaded along with
  433. the original entity.
  434. .. _pchinternals-modules:
  435. Modules
  436. -------
  437. Modules generalize the chained precompiled header model yet further, from a
  438. linear chain of precompiled headers to an arbitrary directed acyclic graph
  439. (DAG) of AST files. All of the same techniques used to make chained
  440. precompiled headers work --- ID number, name lookup, update records --- are
  441. shared with modules. However, the DAG nature of modules introduce a number of
  442. additional complications to the model:
  443. Numbering of IDs
  444. The simple, linear numbering scheme used in chained precompiled headers falls
  445. apart with the module DAG, because different modules may end up with
  446. different numbering schemes for entities they imported from common shared
  447. modules. To account for this, each module file provides information about
  448. which modules it depends on and which ID numbers it assigned to the entities
  449. in those modules, as well as which ID numbers it took for its own new
  450. entities. The AST reader then maps these "local" ID numbers into a "global"
  451. ID number space for the current translation unit, providing a 1-1 mapping
  452. between entities (in whatever AST file they inhabit) and global ID numbers.
  453. If that translation unit is then serialized into an AST file, this mapping
  454. will be stored for use when the AST file is imported.
  455. Declaration merging
  456. It is possible for a given entity (from the language's perspective) to be
  457. declared multiple times in different places. For example, two different
  458. headers can have the declaration of ``printf`` or could forward-declare
  459. ``struct stat``. If each of those headers is included in a module, and some
  460. third party imports both of those modules, there is a potentially serious
  461. problem: name lookup for ``printf`` or ``struct stat`` will find both
  462. declarations, but the AST nodes are unrelated. This would result in a
  463. compilation error, due to an ambiguity in name lookup. Therefore, the AST
  464. reader performs declaration merging according to the appropriate language
  465. semantics, ensuring that the two disjoint declarations are merged into a
  466. single redeclaration chain (with a common canonical declaration), so that it
  467. is as if one of the headers had been included before the other.
  468. Name Visibility
  469. Modules allow certain names that occur during module creation to be "hidden",
  470. so that they are not part of the public interface of the module and are not
  471. visible to its clients. The AST reader maintains a "visible" bit on various
  472. AST nodes (declarations, macros, etc.) to indicate whether that particular
  473. AST node is currently visible; the various name lookup mechanisms in Clang
  474. inspect the visible bit to determine whether that entity, which is still in
  475. the AST (because other, visible AST nodes may depend on it), can actually be
  476. found by name lookup. When a new (sub)module is imported, it may make
  477. existing, non-visible, already-deserialized AST nodes visible; it is the
  478. responsibility of the AST reader to find and update these AST nodes when it
  479. is notified of the import.