SanitizerCoverage.rst 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358
  1. =================
  2. SanitizerCoverage
  3. =================
  4. .. contents::
  5. :local:
  6. Introduction
  7. ============
  8. NOTE: this document applies to the original Clang project, not the DirectX
  9. Compiler. It's made available for informational purposes only.
  10. Sanitizer tools have a very simple code coverage tool built in. It allows to
  11. get function-level, basic-block-level, and edge-level coverage at a very low
  12. cost.
  13. How to build and run
  14. ====================
  15. SanitizerCoverage can be used with :doc:`AddressSanitizer`,
  16. :doc:`LeakSanitizer`, :doc:`MemorySanitizer`, and UndefinedBehaviorSanitizer.
  17. In addition to ``-fsanitize=``, pass one of the following compile-time flags:
  18. * ``-fsanitize-coverage=func`` for function-level coverage (very fast).
  19. * ``-fsanitize-coverage=bb`` for basic-block-level coverage (may add up to 30%
  20. **extra** slowdown).
  21. * ``-fsanitize-coverage=edge`` for edge-level coverage (up to 40% slowdown).
  22. You may also specify ``-fsanitize-coverage=indirect-calls`` for
  23. additional `caller-callee coverage`_.
  24. At run time, pass ``coverage=1`` in ``ASAN_OPTIONS``, ``LSAN_OPTIONS``,
  25. ``MSAN_OPTIONS`` or ``UBSAN_OPTIONS``, as appropriate.
  26. To get `Coverage counters`_, add ``-fsanitize-coverage=8bit-counters``
  27. to one of the above compile-time flags. At runtime, use
  28. ``*SAN_OPTIONS=coverage=1:coverage_counters=1``.
  29. Example:
  30. .. code-block:: console
  31. % cat -n cov.cc
  32. 1 #include <stdio.h>
  33. 2 __attribute__((noinline))
  34. 3 void foo() { printf("foo\n"); }
  35. 4
  36. 5 int main(int argc, char **argv) {
  37. 6 if (argc == 2)
  38. 7 foo();
  39. 8 printf("main\n");
  40. 9 }
  41. % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=func
  42. % ASAN_OPTIONS=coverage=1 ./a.out; ls -l *sancov
  43. main
  44. -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  45. % ASAN_OPTIONS=coverage=1 ./a.out foo ; ls -l *sancov
  46. foo
  47. main
  48. -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  49. -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
  50. Every time you run an executable instrumented with SanitizerCoverage
  51. one ``*.sancov`` file is created during the process shutdown.
  52. If the executable is dynamically linked against instrumented DSOs,
  53. one ``*.sancov`` file will be also created for every DSO.
  54. Postprocessing
  55. ==============
  56. The format of ``*.sancov`` files is very simple: the first 8 bytes is the magic,
  57. one of ``0xC0BFFFFFFFFFFF64`` and ``0xC0BFFFFFFFFFFF32``. The last byte of the
  58. magic defines the size of the following offsets. The rest of the data is the
  59. offsets in the corresponding binary/DSO that were executed during the run.
  60. A simple script
  61. ``$LLVM/projects/compiler-rt/lib/sanitizer_common/scripts/sancov.py`` is
  62. provided to dump these offsets.
  63. .. code-block:: console
  64. % sancov.py print a.out.22679.sancov a.out.22673.sancov
  65. sancov.py: read 2 PCs from a.out.22679.sancov
  66. sancov.py: read 1 PCs from a.out.22673.sancov
  67. sancov.py: 2 files merged; 2 PCs total
  68. 0x465250
  69. 0x4652a0
  70. You can then filter the output of ``sancov.py`` through ``addr2line --exe
  71. ObjectFile`` or ``llvm-symbolizer --obj ObjectFile`` to get file names and line
  72. numbers:
  73. .. code-block:: console
  74. % sancov.py print a.out.22679.sancov a.out.22673.sancov 2> /dev/null | llvm-symbolizer --obj a.out
  75. cov.cc:3
  76. cov.cc:5
  77. How good is the coverage?
  78. =========================
  79. It is possible to find out which PCs are not covered, by subtracting the covered
  80. set from the set of all instrumented PCs. The latter can be obtained by listing
  81. all callsites of ``__sanitizer_cov()`` in the binary. On Linux, ``sancov.py``
  82. can do this for you. Just supply the path to binary and a list of covered PCs:
  83. .. code-block:: console
  84. % sancov.py print a.out.12345.sancov > covered.txt
  85. sancov.py: read 2 64-bit PCs from a.out.12345.sancov
  86. sancov.py: 1 file merged; 2 PCs total
  87. % sancov.py missing a.out < covered.txt
  88. sancov.py: found 3 instrumented PCs in a.out
  89. sancov.py: read 2 PCs from stdin
  90. sancov.py: 1 PCs missing from coverage
  91. 0x4cc61c
  92. Edge coverage
  93. =============
  94. Consider this code:
  95. .. code-block:: c++
  96. void foo(int *a) {
  97. if (a)
  98. *a = 0;
  99. }
  100. It contains 3 basic blocks, let's name them A, B, C:
  101. .. code-block:: none
  102. A
  103. |\
  104. | \
  105. | B
  106. | /
  107. |/
  108. C
  109. If blocks A, B, and C are all covered we know for certain that the edges A=>B
  110. and B=>C were executed, but we still don't know if the edge A=>C was executed.
  111. Such edges of control flow graph are called
  112. `critical <http://en.wikipedia.org/wiki/Control_flow_graph#Special_edges>`_. The
  113. edge-level coverage (``-fsanitize-coverage=edge``) simply splits all critical
  114. edges by introducing new dummy blocks and then instruments those blocks:
  115. .. code-block:: none
  116. A
  117. |\
  118. | \
  119. D B
  120. | /
  121. |/
  122. C
  123. Bitset
  124. ======
  125. When ``coverage_bitset=1`` run-time flag is given, the coverage will also be
  126. dumped as a bitset (text file with 1 for blocks that have been executed and 0
  127. for blocks that were not).
  128. .. code-block:: console
  129. % clang++ -fsanitize=address -fsanitize-coverage=edge cov.cc
  130. % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out
  131. main
  132. % ASAN_OPTIONS="coverage=1:coverage_bitset=1" ./a.out 1
  133. foo
  134. main
  135. % head *bitset*
  136. ==> a.out.38214.bitset-sancov <==
  137. 01101
  138. ==> a.out.6128.bitset-sancov <==
  139. 11011%
  140. For a given executable the length of the bitset is always the same (well,
  141. unless dlopen/dlclose come into play), so the bitset coverage can be
  142. easily used for bitset-based corpus distillation.
  143. Caller-callee coverage
  144. ======================
  145. (Experimental!)
  146. Every indirect function call is instrumented with a run-time function call that
  147. captures caller and callee. At the shutdown time the process dumps a separate
  148. file called ``caller-callee.PID.sancov`` which contains caller/callee pairs as
  149. pairs of lines (odd lines are callers, even lines are callees)
  150. .. code-block:: console
  151. a.out 0x4a2e0c
  152. a.out 0x4a6510
  153. a.out 0x4a2e0c
  154. a.out 0x4a87f0
  155. Current limitations:
  156. * Only the first 14 callees for every caller are recorded, the rest are silently
  157. ignored.
  158. * The output format is not very compact since caller and callee may reside in
  159. different modules and we need to spell out the module names.
  160. * The routine that dumps the output is not optimized for speed
  161. * Only Linux x86_64 is tested so far.
  162. * Sandboxes are not supported.
  163. Coverage counters
  164. =================
  165. This experimental feature is inspired by
  166. `AFL <http://lcamtuf.coredump.cx/afl/technical_details.txt>`_'s coverage
  167. instrumentation. With additional compile-time and run-time flags you can get
  168. more sensitive coverage information. In addition to boolean values assigned to
  169. every basic block (edge) the instrumentation will collect imprecise counters.
  170. On exit, every counter will be mapped to a 8-bit bitset representing counter
  171. ranges: ``1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+`` and those 8-bit bitsets will
  172. be dumped to disk.
  173. .. code-block:: console
  174. % clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=edge,8bit-counters
  175. % ASAN_OPTIONS="coverage=1:coverage_counters=1" ./a.out
  176. % ls -l *counters-sancov
  177. ... a.out.17110.counters-sancov
  178. % xxd *counters-sancov
  179. 0000000: 0001 0100 01
  180. These counters may also be used for in-process coverage-guided fuzzers. See
  181. ``include/sanitizer/coverage_interface.h``:
  182. .. code-block:: c++
  183. // The coverage instrumentation may optionally provide imprecise counters.
  184. // Rather than exposing the counter values to the user we instead map
  185. // the counters to a bitset.
  186. // Every counter is associated with 8 bits in the bitset.
  187. // We define 8 value ranges: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+
  188. // The i-th bit is set to 1 if the counter value is in the i-th range.
  189. // This counter-based coverage implementation is *not* thread-safe.
  190. // Returns the number of registered coverage counters.
  191. uintptr_t __sanitizer_get_number_of_counters();
  192. // Updates the counter 'bitset', clears the counters and returns the number of
  193. // new bits in 'bitset'.
  194. // If 'bitset' is nullptr, only clears the counters.
  195. // Otherwise 'bitset' should be at least
  196. // __sanitizer_get_number_of_counters bytes long and 8-aligned.
  197. uintptr_t
  198. __sanitizer_update_counter_bitset_and_clear_counters(uint8_t *bitset);
  199. Output directory
  200. ================
  201. By default, .sancov files are created in the current working directory.
  202. This can be changed with ``ASAN_OPTIONS=coverage_dir=/path``:
  203. .. code-block:: console
  204. % ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
  205. % ls -l /tmp/cov/*sancov
  206. -rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
  207. -rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov
  208. Sudden death
  209. ============
  210. Normally, coverage data is collected in memory and saved to disk when the
  211. program exits (with an ``atexit()`` handler), when a SIGSEGV is caught, or when
  212. ``__sanitizer_cov_dump()`` is called.
  213. If the program ends with a signal that ASan does not handle (or can not handle
  214. at all, like SIGKILL), coverage data will be lost. This is a big problem on
  215. Android, where SIGKILL is a normal way of evicting applications from memory.
  216. With ``ASAN_OPTIONS=coverage=1:coverage_direct=1`` coverage data is written to a
  217. memory-mapped file as soon as it collected.
  218. .. code-block:: console
  219. % ASAN_OPTIONS="coverage=1:coverage_direct=1" ./a.out
  220. main
  221. % ls
  222. 7036.sancov.map 7036.sancov.raw a.out
  223. % sancov.py rawunpack 7036.sancov.raw
  224. sancov.py: reading map 7036.sancov.map
  225. sancov.py: unpacking 7036.sancov.raw
  226. writing 1 PCs to a.out.7036.sancov
  227. % sancov.py print a.out.7036.sancov
  228. sancov.py: read 1 PCs from a.out.7036.sancov
  229. sancov.py: 1 files merged; 1 PCs total
  230. 0x4b2bae
  231. Note that on 64-bit platforms, this method writes 2x more data than the default,
  232. because it stores full PC values instead of 32-bit offsets.
  233. In-process fuzzing
  234. ==================
  235. Coverage data could be useful for fuzzers and sometimes it is preferable to run
  236. a fuzzer in the same process as the code being fuzzed (in-process fuzzer).
  237. You can use ``__sanitizer_get_total_unique_coverage()`` from
  238. ``<sanitizer/coverage_interface.h>`` which returns the number of currently
  239. covered entities in the program. This will tell the fuzzer if the coverage has
  240. increased after testing every new input.
  241. If a fuzzer finds a bug in the ASan run, you will need to save the reproducer
  242. before exiting the process. Use ``__asan_set_death_callback`` from
  243. ``<sanitizer/asan_interface.h>`` to do that.
  244. An example of such fuzzer can be found in `the LLVM tree
  245. <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Fuzzer/README.txt?view=markup>`_.
  246. Performance
  247. ===========
  248. This coverage implementation is **fast**. With function-level coverage
  249. (``-fsanitize-coverage=func``) the overhead is not measurable. With
  250. basic-block-level coverage (``-fsanitize-coverage=bb``) the overhead varies
  251. between 0 and 25%.
  252. ============== ========= ========= ========= ========= ========= =========
  253. benchmark cov0 cov1 diff 0-1 cov2 diff 0-2 diff 1-2
  254. ============== ========= ========= ========= ========= ========= =========
  255. 400.perlbench 1296.00 1307.00 1.01 1465.00 1.13 1.12
  256. 401.bzip2 858.00 854.00 1.00 1010.00 1.18 1.18
  257. 403.gcc 613.00 617.00 1.01 683.00 1.11 1.11
  258. 429.mcf 605.00 582.00 0.96 610.00 1.01 1.05
  259. 445.gobmk 896.00 880.00 0.98 1050.00 1.17 1.19
  260. 456.hmmer 892.00 892.00 1.00 918.00 1.03 1.03
  261. 458.sjeng 995.00 1009.00 1.01 1217.00 1.22 1.21
  262. 462.libquantum 497.00 492.00 0.99 534.00 1.07 1.09
  263. 464.h264ref 1461.00 1467.00 1.00 1543.00 1.06 1.05
  264. 471.omnetpp 575.00 590.00 1.03 660.00 1.15 1.12
  265. 473.astar 658.00 652.00 0.99 715.00 1.09 1.10
  266. 483.xalancbmk 471.00 491.00 1.04 582.00 1.24 1.19
  267. 433.milc 616.00 627.00 1.02 627.00 1.02 1.00
  268. 444.namd 602.00 601.00 1.00 654.00 1.09 1.09
  269. 447.dealII 630.00 634.00 1.01 653.00 1.04 1.03
  270. 450.soplex 365.00 368.00 1.01 395.00 1.08 1.07
  271. 453.povray 427.00 434.00 1.02 495.00 1.16 1.14
  272. 470.lbm 357.00 375.00 1.05 370.00 1.04 0.99
  273. 482.sphinx3 927.00 928.00 1.00 1000.00 1.08 1.08
  274. ============== ========= ========= ========= ========= ========= =========
  275. Why another coverage?
  276. =====================
  277. Why did we implement yet another code coverage?
  278. * We needed something that is lightning fast, plays well with
  279. AddressSanitizer, and does not significantly increase the binary size.
  280. * Traditional coverage implementations based in global counters
  281. `suffer from contention on counters
  282. <https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY>`_.