ext_profiler.html 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361
  1. <!DOCTYPE html>
  2. <html>
  3. <head>
  4. <title>Profiler</title>
  5. <meta charset="utf-8">
  6. <meta name="Copyright" content="Copyright (C) 2005-2022">
  7. <meta name="Language" content="en">
  8. <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
  9. <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
  10. </head>
  11. <body>
  12. <div id="site">
  13. <a href="https://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
  14. </div>
  15. <div id="head">
  16. <h1>Profiler</h1>
  17. </div>
  18. <div id="nav">
  19. <ul><li>
  20. <a href="luajit.html">LuaJIT</a>
  21. <ul><li>
  22. <a href="https://luajit.org/download.html">Download <span class="ext">&raquo;</span></a>
  23. </li><li>
  24. <a href="install.html">Installation</a>
  25. </li><li>
  26. <a href="running.html">Running</a>
  27. </li></ul>
  28. </li><li>
  29. <a href="extensions.html">Extensions</a>
  30. <ul><li>
  31. <a href="ext_ffi.html">FFI Library</a>
  32. <ul><li>
  33. <a href="ext_ffi_tutorial.html">FFI Tutorial</a>
  34. </li><li>
  35. <a href="ext_ffi_api.html">ffi.* API</a>
  36. </li><li>
  37. <a href="ext_ffi_semantics.html">FFI Semantics</a>
  38. </li></ul>
  39. </li><li>
  40. <a href="ext_buffer.html">String Buffers</a>
  41. </li><li>
  42. <a href="ext_jit.html">jit.* Library</a>
  43. </li><li>
  44. <a href="ext_c_api.html">Lua/C API</a>
  45. </li><li>
  46. <a class="current" href="ext_profiler.html">Profiler</a>
  47. </li></ul>
  48. </li><li>
  49. <a href="status.html">Status</a>
  50. </li><li>
  51. <a href="faq.html">FAQ</a>
  52. </li><li>
  53. <a href="http://wiki.luajit.org/">Wiki <span class="ext">&raquo;</span></a>
  54. </li><li>
  55. <a href="https://luajit.org/list.html">Mailing List <span class="ext">&raquo;</span></a>
  56. </li></ul>
  57. </div>
  58. <div id="main">
  59. <p>
  60. LuaJIT has an integrated statistical profiler with very low overhead. It
  61. allows sampling the currently executing stack and other parameters in
  62. regular intervals.
  63. </p>
  64. <p>
  65. The integrated profiler can be accessed from three levels:
  66. </p>
  67. <ul>
  68. <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
  69. <a href="#j_p"><tt>-jp</tt></a> command line option.</li>
  70. <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
  71. <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
  72. </ul>
  73. <h2 id="hl_profiler">High-Level Profiler</h2>
  74. <p>
  75. The bundled high-level profiler offers basic profiling functionality. It
  76. generates simple textual summaries or source code annotations. It can be
  77. accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
  78. or from Lua code by loading the underlying <tt>jit.p</tt> module.
  79. </p>
  80. <p>
  81. To cut to the chase &mdash; run this to get a CPU usage profile by
  82. function name:
  83. </p>
  84. <pre class="code">
  85. luajit -jp myapp.lua
  86. </pre>
  87. <p>
  88. It's <em>not</em> a stated goal of the bundled profiler to add every
  89. possible option or to cater for special profiling needs. The low-level
  90. profiler APIs are documented below. They may be used by third-party
  91. authors to implement advanced functionality, e.g. IDE integration or
  92. graphical profilers.
  93. </p>
  94. <p>
  95. Note: Sampling works for both interpreted and JIT-compiled code. The
  96. results for JIT-compiled code may sometimes be surprising. LuaJIT
  97. heavily optimizes and inlines Lua code &mdash; there's no simple
  98. one-to-one correspondence between source code lines and the sampled
  99. machine code.
  100. </p>
  101. <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
  102. <p>
  103. The <tt>-jp</tt> command line option starts the high-level profiler.
  104. When the application run by the command line terminates, the profiler
  105. stops and writes the results to <tt>stdout</tt> or to the specified
  106. <tt>output</tt> file.
  107. </p>
  108. <p>
  109. The <tt>options</tt> argument specifies how the profiling is to be
  110. performed:
  111. </p>
  112. <ul>
  113. <li><tt>f</tt> &mdash; Stack dump: function name, otherwise module:line.
  114. This is the default mode.</li>
  115. <li><tt>F</tt> &mdash; Stack dump: ditto, but dump module:name.</li>
  116. <li><tt>l</tt> &mdash; Stack dump: module:line.</li>
  117. <li><tt>&lt;number&gt;</tt> &mdash; stack dump depth (callee &larr;
  118. caller). Default: 1.</li>
  119. <li><tt>-&lt;number&gt;</tt> &mdash; Inverse stack dump depth (caller
  120. &rarr; callee).</li>
  121. <li><tt>s</tt> &mdash; Split stack dump after first stack level. Implies
  122. depth&nbsp;&ge;&nbsp;2 or depth&nbsp;&le;&nbsp;-2.</li>
  123. <li><tt>p</tt> &mdash; Show full path for module names.</li>
  124. <li><tt>v</tt> &mdash; Show VM states.</li>
  125. <li><tt>z</tt> &mdash; Show <a href="#jit_zone">zones</a>.</li>
  126. <li><tt>r</tt> &mdash; Show raw sample counts. Default: show percentages.</li>
  127. <li><tt>a</tt> &mdash; Annotate excerpts from source code files.</li>
  128. <li><tt>A</tt> &mdash; Annotate complete source code files.</li>
  129. <li><tt>G</tt> &mdash; Produce raw output suitable for graphical tools.</li>
  130. <li><tt>m&lt;number&gt;</tt> &mdash; Minimum sample percentage to be shown.
  131. Default: 3%.</li>
  132. <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds.
  133. Default: 10ms.<br>
  134. Note: The actual sampling precision is OS-dependent.</li>
  135. </ul>
  136. <p>
  137. The default output for <tt>-jp</tt> is a list of the most CPU consuming
  138. spots in the application. Increasing the stack dump depth with (say)
  139. <tt>-jp=2</tt> may help to point out the main callers or callees of
  140. hotspots. But sample aggregation is still flat per unique stack dump.
  141. </p>
  142. <p>
  143. To get a two-level view (split view) of callers/callees, use
  144. <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
  145. level are relative to the first level.
  146. </p>
  147. <p>
  148. To see how much time is spent in each line relative to a function, use
  149. <tt>-jp=fl</tt>.
  150. </p>
  151. <p>
  152. To see how much time is spent in different VM states or
  153. <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
  154. </p>
  155. <p>
  156. Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
  157. views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
  158. spent in a VM state or zone vs. hotspots. This can be used to answer
  159. questions like "Which time-consuming functions are only interpreted?" or
  160. "What's the garbage collector overhead for a specific function?".
  161. </p>
  162. <p>
  163. Multiple options can be combined &mdash; but not all combinations make
  164. sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
  165. deep in 4ms intervals and shows a split view of the CPU consuming
  166. functions and their callers with a 1% threshold.
  167. </p>
  168. <p>
  169. Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
  170. always flat and at the line level. Obviously, the source code files need
  171. to be readable by the profiler script.
  172. </p>
  173. <p>
  174. The high-level profiler can also be started and stopped from Lua code with:
  175. </p>
  176. <pre class="code">
  177. require("jit.p").start(options, output)
  178. ...
  179. require("jit.p").stop()
  180. </pre>
  181. <h3 id="jit_zone"><tt>jit.zone</tt> &mdash; Zones</h3>
  182. <p>
  183. Zones can be used to provide information about different parts of an
  184. application to the high-level profiler. E.g. a game could make use of an
  185. <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
  186. organized as a stack.
  187. </p>
  188. <p>
  189. The <tt>jit.zone</tt> module needs to be loaded explicitly:
  190. </p>
  191. <pre class="code">
  192. local zone = require("jit.zone")
  193. </pre>
  194. <ul>
  195. <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
  196. <li><tt>zone()</tt> pops the current zone from the zone stack and
  197. returns its name.</li>
  198. <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
  199. <li><tt>zone:flush()</tt> flushes the zone stack.</li>
  200. </ul>
  201. <p>
  202. To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
  203. spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
  204. </p>
  205. <h2 id="ll_lua_api">Low-level Lua API</h2>
  206. <p>
  207. The <tt>jit.profile</tt> module gives access to the low-level API of the
  208. profiler from Lua code. This module needs to be loaded explicitly:
  209. <pre class="code">
  210. local profile = require("jit.profile")
  211. </pre>
  212. <p>
  213. This module can be used to implement your own higher-level profiler.
  214. A typical profiling run starts the profiler, captures stack dumps in
  215. the profiler callback, adds them to a hash table to aggregate the number
  216. of samples, stops the profiler and then analyzes all captured
  217. stack dumps. Other parameters can be sampled in the profiler callback,
  218. too. But it's important not to spend too much time in the callback,
  219. since this may skew the statistics.
  220. </p>
  221. <h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
  222. &mdash; Start profiler</h3>
  223. <p>
  224. This function starts the profiler. The <tt>mode</tt> argument is a
  225. string holding options:
  226. </p>
  227. <ul>
  228. <li><tt>f</tt> &mdash; Profile with precision down to the function level.</li>
  229. <li><tt>l</tt> &mdash; Profile with precision down to the line level.</li>
  230. <li><tt>i&lt;number&gt;</tt> &mdash; Sampling interval in milliseconds (default
  231. 10ms).</br>
  232. Note: The actual sampling precision is OS-dependent.
  233. </li>
  234. </ul>
  235. <p>
  236. The <tt>cb</tt> argument is a callback function which is called with
  237. three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
  238. called on a separate coroutine, the <tt>thread</tt> argument is the
  239. state that holds the stack to sample for profiling. Note: do
  240. <em>not</em> modify the stack of that state or call functions on it.
  241. </p>
  242. <p>
  243. <tt>samples</tt> gives the number of accumulated samples since the last
  244. callback (usually 1).
  245. </p>
  246. <p>
  247. <tt>vmstate</tt> holds the VM state at the time the profiling timer
  248. triggered. This may or may not correspond to the state of the VM when
  249. the profiling callback is called. The state is either <tt>'N'</tt>
  250. native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
  251. C&nbsp;code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
  252. compiler.
  253. </p>
  254. <h3 id="profile_stop"><tt>profile.stop()</tt>
  255. &mdash; Stop profiler</h3>
  256. <p>
  257. This function stops the profiler.
  258. </p>
  259. <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
  260. &mdash; Dump stack </h3>
  261. <p>
  262. This function allows taking stack dumps in an efficient manner. It
  263. returns a string with a stack dump for the <tt>thread</tt> (coroutine),
  264. formatted according to the <tt>fmt</tt> argument:
  265. </p>
  266. <ul>
  267. <li><tt>p</tt> &mdash; Preserve the full path for module names. Otherwise,
  268. only the file name is used.</li>
  269. <li><tt>f</tt> &mdash; Dump the function name if it can be derived. Otherwise,
  270. use module:line.</li>
  271. <li><tt>F</tt> &mdash; Ditto, but dump module:name.</li>
  272. <li><tt>l</tt> &mdash; Dump module:line.</li>
  273. <li><tt>Z</tt> &mdash; Zap the following characters for the last dumped
  274. frame.</li>
  275. <li>All other characters are added verbatim to the output string.</li>
  276. </ul>
  277. <p>
  278. The <tt>depth</tt> argument gives the number of frames to dump, starting
  279. at the topmost frame of the thread. A negative number dumps the frames in
  280. inverse order.
  281. </p>
  282. <p>
  283. The first example prints a list of the current module names and line
  284. numbers of up to 10 frames in separate lines. The second example prints
  285. semicolon-separated function names for all frames (up to 100) in inverse
  286. order:
  287. </p>
  288. <pre class="code">
  289. print(profile.dumpstack(thread, "l\n", 10))
  290. print(profile.dumpstack(thread, "lZ;", -100))
  291. </pre>
  292. <h2 id="ll_c_api">Low-level C API</h2>
  293. <p>
  294. The profiler can be controlled directly from C&nbsp;code, e.g. for
  295. use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
  296. <a href="ext_c_api.html">Lua/C API</a> extensions).
  297. </p>
  298. <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
  299. &mdash; Start profiler</h3>
  300. <p>
  301. This function starts the profiler. <a href="#profile_start">See
  302. above</a> for a description of the <tt>mode</tt> argument.
  303. </p>
  304. <p>
  305. The <tt>cb</tt> argument is a callback function with the following
  306. declaration:
  307. </p>
  308. <pre class="code">
  309. typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
  310. int samples, int vmstate);
  311. </pre>
  312. <p>
  313. <tt>data</tt> is available for use by the callback. <tt>L</tt> is the
  314. state that holds the stack to sample for profiling. Note: do
  315. <em>not</em> modify this stack or call functions on this stack &mdash;
  316. use a separate coroutine for this purpose. <a href="#profile_start">See
  317. above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
  318. </p>
  319. <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
  320. &mdash; Stop profiler</h3>
  321. <p>
  322. This function stops the profiler.
  323. </p>
  324. <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
  325. &mdash; Dump stack </h3>
  326. <p>
  327. This function allows taking stack dumps in an efficient manner.
  328. <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
  329. and <tt>depth</tt>.
  330. </p>
  331. <p>
  332. This function returns a <tt>const&nbsp;char&nbsp;*</tt> pointing to a
  333. private string buffer of the profiler. The <tt>int&nbsp;*len</tt>
  334. argument returns the length of the output string. The buffer is
  335. overwritten on the next call and deallocated when the profiler stops.
  336. You either need to consume the content immediately or copy it for later
  337. use.
  338. </p>
  339. <br class="flush">
  340. </div>
  341. <div id="foot">
  342. <hr class="hide">
  343. Copyright &copy; 2005-2022
  344. <span class="noprint">
  345. &middot;
  346. <a href="contact.html">Contact</a>
  347. </span>
  348. </div>
  349. </body>
  350. </html>