123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365 |
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
- <html>
- <head>
- <title>Profiler</title>
- <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
- <meta name="Author" content="Mike Pall">
- <meta name="Copyright" content="Copyright (C) 2005-2017, Mike Pall">
- <meta name="Language" content="en">
- <link rel="stylesheet" type="text/css" href="bluequad.css" media="screen">
- <link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print">
- </head>
- <body>
- <div id="site">
- <a href="http://luajit.org"><span>Lua<span id="logo">JIT</span></span></a>
- </div>
- <div id="head">
- <h1>Profiler</h1>
- </div>
- <div id="nav">
- <ul><li>
- <a href="luajit.html">LuaJIT</a>
- <ul><li>
- <a href="http://luajit.org/download.html">Download <span class="ext">»</span></a>
- </li><li>
- <a href="install.html">Installation</a>
- </li><li>
- <a href="running.html">Running</a>
- </li></ul>
- </li><li>
- <a href="extensions.html">Extensions</a>
- <ul><li>
- <a href="ext_ffi.html">FFI Library</a>
- <ul><li>
- <a href="ext_ffi_tutorial.html">FFI Tutorial</a>
- </li><li>
- <a href="ext_ffi_api.html">ffi.* API</a>
- </li><li>
- <a href="ext_ffi_semantics.html">FFI Semantics</a>
- </li></ul>
- </li><li>
- <a href="ext_jit.html">jit.* Library</a>
- </li><li>
- <a href="ext_c_api.html">Lua/C API</a>
- </li><li>
- <a class="current" href="ext_profiler.html">Profiler</a>
- </li></ul>
- </li><li>
- <a href="status.html">Status</a>
- <ul><li>
- <a href="changes.html">Changes</a>
- </li></ul>
- </li><li>
- <a href="faq.html">FAQ</a>
- </li><li>
- <a href="http://luajit.org/performance.html">Performance <span class="ext">»</span></a>
- </li><li>
- <a href="http://wiki.luajit.org/">Wiki <span class="ext">»</span></a>
- </li><li>
- <a href="http://luajit.org/list.html">Mailing List <span class="ext">»</span></a>
- </li></ul>
- </div>
- <div id="main">
- <p>
- LuaJIT has an integrated statistical profiler with very low overhead. It
- allows sampling the currently executing stack and other parameters in
- regular intervals.
- </p>
- <p>
- The integrated profiler can be accessed from three levels:
- </p>
- <ul>
- <li>The <a href="#hl_profiler">bundled high-level profiler</a>, invoked by the
- <a href="#j_p"><tt>-jp</tt></a> command line option.</li>
- <li>A <a href="#ll_lua_api">low-level Lua API</a> to control the profiler.</li>
- <li>A <a href="#ll_c_api">low-level C API</a> to control the profiler.</li>
- </ul>
- <h2 id="hl_profiler">High-Level Profiler</h2>
- <p>
- The bundled high-level profiler offers basic profiling functionality. It
- generates simple textual summaries or source code annotations. It can be
- accessed with the <a href="#j_p"><tt>-jp</tt></a> command line option
- or from Lua code by loading the underlying <tt>jit.p</tt> module.
- </p>
- <p>
- To cut to the chase — run this to get a CPU usage profile by
- function name:
- </p>
- <pre class="code">
- luajit -jp myapp.lua
- </pre>
- <p>
- It's <em>not</em> a stated goal of the bundled profiler to add every
- possible option or to cater for special profiling needs. The low-level
- profiler APIs are documented below. They may be used by third-party
- authors to implement advanced functionality, e.g. IDE integration or
- graphical profilers.
- </p>
- <p>
- Note: Sampling works for both interpreted and JIT-compiled code. The
- results for JIT-compiled code may sometimes be surprising. LuaJIT
- heavily optimizes and inlines Lua code — there's no simple
- one-to-one correspondence between source code lines and the sampled
- machine code.
- </p>
- <h3 id="j_p"><tt>-jp=[options[,output]]</tt></h3>
- <p>
- The <tt>-jp</tt> command line option starts the high-level profiler.
- When the application run by the command line terminates, the profiler
- stops and writes the results to <tt>stdout</tt> or to the specified
- <tt>output</tt> file.
- </p>
- <p>
- The <tt>options</tt> argument specifies how the profiling is to be
- performed:
- </p>
- <ul>
- <li><tt>f</tt> — Stack dump: function name, otherwise module:line.
- This is the default mode.</li>
- <li><tt>F</tt> — Stack dump: ditto, but dump module:name.</li>
- <li><tt>l</tt> — Stack dump: module:line.</li>
- <li><tt><number></tt> — stack dump depth (callee ←
- caller). Default: 1.</li>
- <li><tt>-<number></tt> — Inverse stack dump depth (caller
- → callee).</li>
- <li><tt>s</tt> — Split stack dump after first stack level. Implies
- depth ≥ 2 or depth ≤ -2.</li>
- <li><tt>p</tt> — Show full path for module names.</li>
- <li><tt>v</tt> — Show VM states.</li>
- <li><tt>z</tt> — Show <a href="#jit_zone">zones</a>.</li>
- <li><tt>r</tt> — Show raw sample counts. Default: show percentages.</li>
- <li><tt>a</tt> — Annotate excerpts from source code files.</li>
- <li><tt>A</tt> — Annotate complete source code files.</li>
- <li><tt>G</tt> — Produce raw output suitable for graphical tools.</li>
- <li><tt>m<number></tt> — Minimum sample percentage to be shown.
- Default: 3%.</li>
- <li><tt>i<number></tt> — Sampling interval in milliseconds.
- Default: 10ms.<br>
- Note: The actual sampling precision is OS-dependent.</li>
- </ul>
- <p>
- The default output for <tt>-jp</tt> is a list of the most CPU consuming
- spots in the application. Increasing the stack dump depth with (say)
- <tt>-jp=2</tt> may help to point out the main callers or callees of
- hotspots. But sample aggregation is still flat per unique stack dump.
- </p>
- <p>
- To get a two-level view (split view) of callers/callees, use
- <tt>-jp=s</tt> or <tt>-jp=-s</tt>. The percentages shown for the second
- level are relative to the first level.
- </p>
- <p>
- To see how much time is spent in each line relative to a function, use
- <tt>-jp=fl</tt>.
- </p>
- <p>
- To see how much time is spent in different VM states or
- <a href="#jit_zone">zones</a>, use <tt>-jp=v</tt> or <tt>-jp=z</tt>.
- </p>
- <p>
- Combinations of <tt>v/z</tt> with <tt>f/F/l</tt> produce two-level
- views, e.g. <tt>-jp=vf</tt> or <tt>-jp=fv</tt>. This shows the time
- spent in a VM state or zone vs. hotspots. This can be used to answer
- questions like "Which time consuming functions are only interpreted?" or
- "What's the garbage collector overhead for a specific function?".
- </p>
- <p>
- Multiple options can be combined — but not all combinations make
- sense, see above. E.g. <tt>-jp=3si4m1</tt> samples three stack levels
- deep in 4ms intervals and shows a split view of the CPU consuming
- functions and their callers with a 1% threshold.
- </p>
- <p>
- Source code annotations produced by <tt>-jp=a</tt> or <tt>-jp=A</tt> are
- always flat and at the line level. Obviously, the source code files need
- to be readable by the profiler script.
- </p>
- <p>
- The high-level profiler can also be started and stopped from Lua code with:
- </p>
- <pre class="code">
- require("jit.p").start(options, output)
- ...
- require("jit.p").stop()
- </pre>
- <h3 id="jit_zone"><tt>jit.zone</tt> — Zones</h3>
- <p>
- Zones can be used to provide information about different parts of an
- application to the high-level profiler. E.g. a game could make use of an
- <tt>"AI"</tt> zone, a <tt>"PHYS"</tt> zone, etc. Zones are hierarchical,
- organized as a stack.
- </p>
- <p>
- The <tt>jit.zone</tt> module needs to be loaded explicitly:
- </p>
- <pre class="code">
- local zone = require("jit.zone")
- </pre>
- <ul>
- <li><tt>zone("name")</tt> pushes a named zone to the zone stack.</li>
- <li><tt>zone()</tt> pops the current zone from the zone stack and
- returns its name.</li>
- <li><tt>zone:get()</tt> returns the current zone name or <tt>nil</tt>.</li>
- <li><tt>zone:flush()</tt> flushes the zone stack.</li>
- </ul>
- <p>
- To show the time spent in each zone use <tt>-jp=z</tt>. To show the time
- spent relative to hotspots use e.g. <tt>-jp=zf</tt> or <tt>-jp=fz</tt>.
- </p>
- <h2 id="ll_lua_api">Low-level Lua API</h2>
- <p>
- The <tt>jit.profile</tt> module gives access to the low-level API of the
- profiler from Lua code. This module needs to be loaded explicitly:
- <pre class="code">
- local profile = require("jit.profile")
- </pre>
- <p>
- This module can be used to implement your own higher-level profiler.
- A typical profiling run starts the profiler, captures stack dumps in
- the profiler callback, adds them to a hash table to aggregate the number
- of samples, stops the profiler and then analyzes all of the captured
- stack dumps. Other parameters can be sampled in the profiler callback,
- too. But it's important not to spend too much time in the callback,
- since this may skew the statistics.
- </p>
- <h3 id="profile_start"><tt>profile.start(mode, cb)</tt>
- — Start profiler</h3>
- <p>
- This function starts the profiler. The <tt>mode</tt> argument is a
- string holding options:
- </p>
- <ul>
- <li><tt>f</tt> — Profile with precision down to the function level.</li>
- <li><tt>l</tt> — Profile with precision down to the line level.</li>
- <li><tt>i<number></tt> — Sampling interval in milliseconds (default
- 10ms).</br>
- Note: The actual sampling precision is OS-dependent.
- </li>
- </ul>
- <p>
- The <tt>cb</tt> argument is a callback function which is called with
- three arguments: <tt>(thread, samples, vmstate)</tt>. The callback is
- called on a separate coroutine, the <tt>thread</tt> argument is the
- state that holds the stack to sample for profiling. Note: do
- <em>not</em> modify the stack of that state or call functions on it.
- </p>
- <p>
- <tt>samples</tt> gives the number of accumulated samples since the last
- callback (usually 1).
- </p>
- <p>
- <tt>vmstate</tt> holds the VM state at the time the profiling timer
- triggered. This may or may not correspond to the state of the VM when
- the profiling callback is called. The state is either <tt>'N'</tt>
- native (compiled) code, <tt>'I'</tt> interpreted code, <tt>'C'</tt>
- C code, <tt>'G'</tt> the garbage collector, or <tt>'J'</tt> the JIT
- compiler.
- </p>
- <h3 id="profile_stop"><tt>profile.stop()</tt>
- — Stop profiler</h3>
- <p>
- This function stops the profiler.
- </p>
- <h3 id="profile_dump"><tt>dump = profile.dumpstack([thread,] fmt, depth)</tt>
- — Dump stack </h3>
- <p>
- This function allows taking stack dumps in an efficient manner. It
- returns a string with a stack dump for the <tt>thread</tt> (coroutine),
- formatted according to the <tt>fmt</tt> argument:
- </p>
- <ul>
- <li><tt>p</tt> — Preserve the full path for module names. Otherwise
- only the file name is used.</li>
- <li><tt>f</tt> — Dump the function name if it can be derived. Otherwise
- use module:line.</li>
- <li><tt>F</tt> — Ditto, but dump module:name.</li>
- <li><tt>l</tt> — Dump module:line.</li>
- <li><tt>Z</tt> — Zap the following characters for the last dumped
- frame.</li>
- <li>All other characters are added verbatim to the output string.</li>
- </ul>
- <p>
- The <tt>depth</tt> argument gives the number of frames to dump, starting
- at the topmost frame of the thread. A negative number dumps the frames in
- inverse order.
- </p>
- <p>
- The first example prints a list of the current module names and line
- numbers of up to 10 frames in separate lines. The second example prints
- semicolon-separated function names for all frames (up to 100) in inverse
- order:
- </p>
- <pre class="code">
- print(profile.dumpstack(thread, "l\n", 10))
- print(profile.dumpstack(thread, "lZ;", -100))
- </pre>
- <h2 id="ll_c_api">Low-level C API</h2>
- <p>
- The profiler can be controlled directly from C code, e.g. for
- use by IDEs. The declarations are in <tt>"luajit.h"</tt> (see
- <a href="ext_c_api.html">Lua/C API</a> extensions).
- </p>
- <h3 id="luaJIT_profile_start"><tt>luaJIT_profile_start(L, mode, cb, data)</tt>
- — Start profiler</h3>
- <p>
- This function starts the profiler. <a href="#profile_start">See
- above</a> for a description of the <tt>mode</tt> argument.
- </p>
- <p>
- The <tt>cb</tt> argument is a callback function with the following
- declaration:
- </p>
- <pre class="code">
- typedef void (*luaJIT_profile_callback)(void *data, lua_State *L,
- int samples, int vmstate);
- </pre>
- <p>
- <tt>data</tt> is available for use by the callback. <tt>L</tt> is the
- state that holds the stack to sample for profiling. Note: do
- <em>not</em> modify this stack or call functions on this stack —
- use a separate coroutine for this purpose. <a href="#profile_start">See
- above</a> for a description of <tt>samples</tt> and <tt>vmstate</tt>.
- </p>
- <h3 id="luaJIT_profile_stop"><tt>luaJIT_profile_stop(L)</tt>
- — Stop profiler</h3>
- <p>
- This function stops the profiler.
- </p>
- <h3 id="luaJIT_profile_dumpstack"><tt>p = luaJIT_profile_dumpstack(L, fmt, depth, len)</tt>
- — Dump stack </h3>
- <p>
- This function allows taking stack dumps in an efficient manner.
- <a href="#profile_dump">See above</a> for a description of <tt>fmt</tt>
- and <tt>depth</tt>.
- </p>
- <p>
- This function returns a <tt>const char *</tt> pointing to a
- private string buffer of the profiler. The <tt>int *len</tt>
- argument returns the length of the output string. The buffer is
- overwritten on the next call and deallocated when the profiler stops.
- You either need to consume the content immediately or copy it for later
- use.
- </p>
- <br class="flush">
- </div>
- <div id="foot">
- <hr class="hide">
- Copyright © 2005-2017 Mike Pall
- <span class="noprint">
- ·
- <a href="contact.html">Contact</a>
- </span>
- </div>
- </body>
- </html>
|