|
|
@@ -1,403 +0,0 @@
|
|
|
-QUICK INTRODUCTION
|
|
|
-
|
|
|
-PStats is Panda's built-in performance analysis tool. It can graph
|
|
|
-frame rate over time, and can further graph the work spent within each
|
|
|
-frame into user-defined subdivisions of the frame (for instance, app,
|
|
|
-cull and draw), and thus can be an invaluable tool in identifying
|
|
|
-performance bottlenecks. It can also show frame-based data that
|
|
|
-reflects any arbitrary quantity other than time intervals, for
|
|
|
-instance, texture memory in use or number of vertices drawn.
|
|
|
-
|
|
|
-The performance graphs may be drawn on the same computer that is
|
|
|
-running the Panda client, or they may be drawn on another computer on
|
|
|
-the same LAN, which is useful for analyzing fullscreen applications.
|
|
|
-The remote computer need not be running the same operating system as
|
|
|
-the client computer.
|
|
|
-
|
|
|
-To use PStats, you first need to build the PStats server program,
|
|
|
-which is part of the Pandatool tree (it's called pstats.exe on
|
|
|
-Windows, and gtk-stats on a Unix platform). Start by running the
|
|
|
-PStats server program (it runs in the background), and then start your
|
|
|
-Direct/Panda client with the following in your Config.prc file:
|
|
|
-
|
|
|
- want-pstats 1
|
|
|
-
|
|
|
-Or, at runtime, issue the Python command:
|
|
|
-
|
|
|
- PStatClient.connect()
|
|
|
-
|
|
|
-Or if you're running pview, press shift-S.
|
|
|
-
|
|
|
-Any of the above will contact your running PStats server program,
|
|
|
-which will proceed to open a window and start a running graph of your
|
|
|
-client's performance. If you are running the server on a different
|
|
|
-machine than the client, add the pstats-host variable to your client's
|
|
|
-Config.prc file, naming the hostname or IP address of the machine
|
|
|
-running the PStats server.
|
|
|
-
|
|
|
-If you are developing Python code, you may be interested in reporting
|
|
|
-the relative time spent within each Python task (by subdividing the
|
|
|
-total time spent in Python, as reported under "Show Code"). To do
|
|
|
-this, add the following lines to your Config.prc file before you start
|
|
|
-ShowBase:
|
|
|
-
|
|
|
- task-timer-verbose 1
|
|
|
- pstats-tasks 1
|
|
|
-
|
|
|
-
|
|
|
-THE PSTATS SERVER (The user interface)
|
|
|
-
|
|
|
-The GUI for managing the graphs and drilling down to view more detail
|
|
|
-is entirely controlled by the PStats server program. At the time of
|
|
|
-this writing, there are two different versions of the PStats server,
|
|
|
-one for Unix called gtk-stats and one for Windows called simply
|
|
|
-pstats. The interfaces are similar but not identical; the following
|
|
|
-paragraphs describe the Windows version.
|
|
|
-
|
|
|
-When you run pstats.exe, it adds a program to the taskbar but does not
|
|
|
-immediately open a window. The program name is typically "PStats
|
|
|
-5185", showing the default PStats TCP port number of 5185; see "HOW IT
|
|
|
-WORKS" below for more details about the TCP communication system. For
|
|
|
-the most part you don't need to worry about the port number, as long
|
|
|
-as server and client agree (and the port is not already being used by
|
|
|
-another application).
|
|
|
-
|
|
|
-Each time a client connects to the PStats server, a new monitor window
|
|
|
-is created. This monitor window owns all of the graphs that you
|
|
|
-create to view the performance data from that particular connection.
|
|
|
-Initially, a strip chart showing the frame time of the main thread is
|
|
|
-created by default; you can create additional graphs by selecting from
|
|
|
-the Graphs pulldown menu.
|
|
|
-
|
|
|
-Time-based Strip Charts
|
|
|
-
|
|
|
-This is the graph type you will use most frequently to examine
|
|
|
-performance data. The horizontal axis represents the passage of time;
|
|
|
-each frame is represented as a vertical slice on the graph. The
|
|
|
-overall height of the colored bands represents the total amount of
|
|
|
-time spent on each frame; within the frame, the time is further
|
|
|
-divided into the primary subdivisions represented by different color
|
|
|
-bands (and labeled on the left). These subdivisions are called
|
|
|
-"collectors" in the PStats terminology, since they represent time
|
|
|
-collected by different tasks.
|
|
|
-
|
|
|
-Normally, the three primary collectors are App, Cull, and Draw, the
|
|
|
-three stages of the graphics pipeline. Atop these three colored
|
|
|
-collectors is the label "Frame", which represents any remaining time
|
|
|
-spent in the frame that was not specifically allocated to one of the
|
|
|
-three child collectors (normally, there should not be significant time
|
|
|
-reported here).
|
|
|
-
|
|
|
-The frame time in milliseconds, averaged over the past three seconds,
|
|
|
-is drawn above the upper right corner of the graph. The labels on the
|
|
|
-guide bars on the right are also shown in milliseconds; if you prefer
|
|
|
-to think about a target frame rate rather than an elapsed time in
|
|
|
-milliseconds, you may find it useful to select "Hz" from the Units
|
|
|
-pulldown menu, which changes the time units accordingly.
|
|
|
-
|
|
|
-The running Panda client suggests its target frame rate, as well as
|
|
|
-the initial vertical scale of the graph (that is, the height of the
|
|
|
-colored bars). You can change the scale freely by clicking within the
|
|
|
-graph itself and dragging the mouse up or down as necessary. One of
|
|
|
-the horizontal guide bars is drawn in a lighter shade of gray; this
|
|
|
-one represents the actual target frame rate suggested by the client.
|
|
|
-The other, darker, guide bars are drawn automatically at harmonic
|
|
|
-subdvisions of the target frame rate. You can change the target frame
|
|
|
-rate with the Config.prc variable pstats-target-frame-rate on the
|
|
|
-client.
|
|
|
-
|
|
|
-You can also create any number of user-defined guide bars by dragging
|
|
|
-them into the graph from the gray space immediately above or below the
|
|
|
-graph. These are drawn in a dashed blue line. It is sometimes useful
|
|
|
-to place one of these to mark a performance level so it may be
|
|
|
-compared to future values (or to alternate configurations).
|
|
|
-
|
|
|
-The primary collectors labeled on the left might themselves be further
|
|
|
-subdivided, if the data is provided by the client. For instance, App
|
|
|
-is often divided into Show Code, Animation, and Collisions, where Show
|
|
|
-Code is the time spent executing any Python code, Animation is the
|
|
|
-time used to compute any animated characters, and Collisions is the
|
|
|
-time spent in the collision traverser(s).
|
|
|
-
|
|
|
-To see any of these further breakdowns, double-click on the
|
|
|
-corresponding colored label (or on the colored band within the graph
|
|
|
-itself). This narrows the focus of the strip chart from the overall
|
|
|
-frame to just the selected collector, which has two advantages.
|
|
|
-Firstly, it may be easier to observe the behavior of one particular
|
|
|
-collector when it is drawn alone (as opposed to being stacked on top
|
|
|
-of some other color bars), and the time in the upper-right corner will
|
|
|
-now reflect just the total time spent within just this collector.
|
|
|
-Secondly, if there are further breakdowns to this collector, they will
|
|
|
-now be shown as further colored bars. As in the Frame chart, the
|
|
|
-topmost label is the name of the parent collector, and any time shown
|
|
|
-in this color represents time allocated to the parent collector that
|
|
|
-is not accounted for by any of the child collectors.
|
|
|
-
|
|
|
-You can further drill down by double-clicking on any of the new
|
|
|
-labels; or double-click on the top label, or the white part of the
|
|
|
-graph, to return back up to the previous level.
|
|
|
-
|
|
|
-Value-based Strip Charts
|
|
|
-
|
|
|
-There are other strip charts you may create, which show arbitrary
|
|
|
-kinds of data per frame other than elapsed time. These can only be
|
|
|
-accessed from the Graphs pulldown menu, and include things such as
|
|
|
-texture memory in use and vertices drawn. They behave similarly to
|
|
|
-the time-based strip charts described above.
|
|
|
-
|
|
|
-Piano Roll Charts
|
|
|
-
|
|
|
-This graph is used less frequently, but when it is needed it is a
|
|
|
-valuable tool to reveal exactly how the time is spent within a frame.
|
|
|
-The PStats server automatically collects together all the time spent
|
|
|
-within each collector and shows it as a single total, but in reality
|
|
|
-it may not all have been spent in one continuous block of time.
|
|
|
-
|
|
|
-For instance, when Panda draws each display region in single-threaded
|
|
|
-mode, it performs a cull traversal followed by a draw traversal for
|
|
|
-each display region. Thus, if your Panda client includes multiple
|
|
|
-display regions, it will alternate its time spent culling and drawing
|
|
|
-as it processes each of them. The strip chart, however, reports only
|
|
|
-the total cull time and draw time spent.
|
|
|
-
|
|
|
-Sometimes you really need to know the sequence of events in the frame,
|
|
|
-not just the total time spent in each collector. The piano roll chart
|
|
|
-shows this kind of data. It is so named because it is similar to the
|
|
|
-paper music roll for an old-style player piano, with holes punched
|
|
|
-down the roll for each note that is to be played. The longer the
|
|
|
-hole, the longer the piano key is held down. (Think of the chart as
|
|
|
-rotated 90 degrees from an actual piano roll. A player piano roll
|
|
|
-plays from bottom to top; the piano roll chart reads from left to
|
|
|
-right.)
|
|
|
-
|
|
|
-Unlike a strip chart, a piano roll chart does not show trends; the
|
|
|
-chart shows only the current frame's data. The horizontal axis shows
|
|
|
-time within the frame, and the individual collectors are stacked up in
|
|
|
-an arbitrary ordering along the vertical axis.
|
|
|
-
|
|
|
-The time spent within the frame is drawn from left to right; at any
|
|
|
-given time, the collector(s) that are active will be drawn with a
|
|
|
-horizontal bar. You can observe the CPU behavior within a frame by
|
|
|
-reading the graph from left to right. You may find it useful to
|
|
|
-select "pause" from the Speed pulldown menu to freeze the graph on
|
|
|
-just one frame while you read it.
|
|
|
-
|
|
|
-Note that the piano roll chart shows time spent within the frame on
|
|
|
-the horizontal axis, instead of the vertical axis, as it is on the
|
|
|
-strip charts. Thus, the guide bars on the piano roll chart are
|
|
|
-vertical lines instead of horizontal lines, and they may be dragged in
|
|
|
-from the left or the right sides (instead of from the top or bottom,
|
|
|
-as on the strip charts). Apart from this detail, these are the same
|
|
|
-guide bars that appear on the strip charts.
|
|
|
-
|
|
|
-The piano roll chart may be created from the Graphs pulldown menu.
|
|
|
-
|
|
|
-Additional threads
|
|
|
-
|
|
|
-If the panda client has multiple threads that generate PStats data,
|
|
|
-the PStats server can open up graphs for these threads as well. Each
|
|
|
-separate thread is considered unrelated to the main thread, and may
|
|
|
-have the same or an independent frame rate. Each separate thread will
|
|
|
-be given its own pulldown menu to create graphs associated with that
|
|
|
-thread; these auxiliary thread menus will appear on the menu bar
|
|
|
-following the Graphs menu. At the time of this writing, support for
|
|
|
-multiple threads within the PStats graph is largely theoretical and
|
|
|
-untested.
|
|
|
-
|
|
|
-
|
|
|
-HOW TO DEFINE YOUR OWN COLLECTORS
|
|
|
-
|
|
|
-The PStats client code is designed to be generic enough to allow users
|
|
|
-to define their own collectors to time any arbitrary blocks of code
|
|
|
-(or record additional non-time-based data), from either the C++ or the
|
|
|
-Python level.
|
|
|
-
|
|
|
-The general idea is to create a PStatCollector for each separate block
|
|
|
-of code you wish to time. The name which is passed to the
|
|
|
-PStatCollector constructor is a unique identifier: all collectors that
|
|
|
-share the same name are deemed to be the same collector.
|
|
|
-
|
|
|
-Furthermore, the collector's name can be used to define the
|
|
|
-hierarchical relationship of each collector with other existing
|
|
|
-collectors. To do this, prefix the collector's name with the name of
|
|
|
-its parent(s), followed by a colon separator. For instance,
|
|
|
-PStatCollector("Draw:Flip") defines a collector named "Flip", which is
|
|
|
-a child of the "Draw" collector, defined elsewhere.
|
|
|
-
|
|
|
-You can also define a collector as a child of another collector by
|
|
|
-giving the parent collector explicitly followed by the name of the
|
|
|
-child collector alone, which is handy for dynamically-defined
|
|
|
-collectors. For instance, PStatCollector(draw, "Flip") defines the
|
|
|
-same collector named above, assuming that draw is the result of the
|
|
|
-PStatCollector("Draw") constructor.
|
|
|
-
|
|
|
-Note that, because of an unfortunate limitation with the interrogate
|
|
|
-parser, statically-defined PStatCollector objects can't be parsed by
|
|
|
-interrogate. (In general, interrogate can't parse C++ objects that
|
|
|
-are constructed with parameters at the outermost scoping level.) As a
|
|
|
-workaround, we usually protect these declarations from interrogate by
|
|
|
-using the syntax #ifndef CPPPARSER .. #endif.
|
|
|
-
|
|
|
-Once you have a collector, simply bracket the region of code you wish
|
|
|
-to time with collector.start() and collector.stop(). It is important
|
|
|
-to ensure that each call to start() is matched by exactly one call to
|
|
|
-stop(). If you are programming in C++, it is highly recommended that
|
|
|
-you use the PStatTimer class to make these calls automatically, which
|
|
|
-guarantees the correct pairing; the PStatTimer's constructor calls
|
|
|
-start() and its destructor calls stop(), so you may simply define a
|
|
|
-PStatTimer object at the beginning of the block of code you wish to
|
|
|
-time. If you are programming in Python, you must call start() and
|
|
|
-stop() explicitly.
|
|
|
-
|
|
|
-When you call start() and there was another collector already started,
|
|
|
-that previous collector is paused until you call the matching stop()
|
|
|
-(at which time the previous collector is resumed). That is, time is
|
|
|
-accumulated only towards the collector indicated by the innermost
|
|
|
-start() .. stop() pair.
|
|
|
-
|
|
|
-Time accumulated towards any collector is also counted towards that
|
|
|
-collector's parent, as defined in the collector's constructor
|
|
|
-(described above).
|
|
|
-
|
|
|
-It is important to understand the difference between collectors nested
|
|
|
-implicitly by runtime start/stop invocations, and the static hierarchy
|
|
|
-implicit in the collector definition. Time is accumulated in parent
|
|
|
-collectors according to the statically-defined parents of the
|
|
|
-innermost active collector only, without regard to the runtime stack
|
|
|
-of paused collectors.
|
|
|
-
|
|
|
-For example, suppose you are in the middle of processing the "Draw"
|
|
|
-task and have therefore called start() on the "Draw" collector. While
|
|
|
-in the middle of processing this block of code, you call a function
|
|
|
-that has its own collector called "Cull:Sort". As soon as you start
|
|
|
-the new collector, you have paused the "Draw" collector and are now
|
|
|
-accumulating time in the "Cull:Sort" collector. Once this new
|
|
|
-collector stops, you will automatically return to accumulating time in
|
|
|
-the "Draw" collector. The time spent within the nested "Cull:Sort"
|
|
|
-collector will be counted towards the "Cull" total time, not the
|
|
|
-"Draw" total time.
|
|
|
-
|
|
|
-Color and Other Optional Collector Properties
|
|
|
-
|
|
|
-If you do not specify a color for a particular collector, it will be
|
|
|
-assigned a random color at runtime. At present, the only way to
|
|
|
-specify a color is to modify
|
|
|
-panda/src/pstatclient/pStatProperties.cxx, and add a line to the table
|
|
|
-for your new collector(s). You can also define additional properties
|
|
|
-here such as a suggested initial scale for the graph and, for
|
|
|
-non-time-based collectors, a unit name and/or scale factor. The order
|
|
|
-in which these collectors are listed in this table is also relevant;
|
|
|
-they will appear in the same order on the graphs. The first column
|
|
|
-should be set to 1 for your new collectors unless you wish them to be
|
|
|
-disabled by default. You must recompile the client (but not the
|
|
|
-server) to reflect changes to this table.
|
|
|
-
|
|
|
-
|
|
|
-HOW IT WORKS (What's actually happening)
|
|
|
-
|
|
|
-The PStats code is divided into two main parts: the client code and
|
|
|
-the server code.
|
|
|
-
|
|
|
-The PStats Client
|
|
|
-
|
|
|
-The client code is in panda/src/pstatclient, and is available to run
|
|
|
-in every Panda client unless it is compiled out. (It will be compiled
|
|
|
-out if OPTIMIZE is set to level 4, unless DO_PSTATS is also explicitly
|
|
|
-set to non-empty. It will also be compiled out if NSPR is not
|
|
|
-available, since both client and server depend on the NSPR library to
|
|
|
-exchange data, even when running the server on the same machine as the
|
|
|
-client.)
|
|
|
-
|
|
|
-The client code is designed for minimal runtime overhead when it is
|
|
|
-compiled in but not enabled (that is, when the client is not in
|
|
|
-contact with a PStats server), as well as when it is enabled (when the
|
|
|
-client is in contact with a PStats server). It is also designed for
|
|
|
-zero runtime overhead when it is compiled out.
|
|
|
-
|
|
|
-There is one global PStatClient class object, which manages all of the
|
|
|
-communications on the client side. Each PStatCollector is simply an
|
|
|
-index into an array stored within the PStatClient object, although the
|
|
|
-interface is intended to hide this detail from the programmer.
|
|
|
-
|
|
|
-Initially, before the PStatClient has established a connection, calls
|
|
|
-to start() and stop() simply return immediately.
|
|
|
-
|
|
|
-When you call PStatClient.connect(), the client attempts to contact
|
|
|
-the PStatServer via a TCP connection to the hostname and port named in
|
|
|
-the pstats-host and pstats-port Config.prc variables, respectively.
|
|
|
-(The default hostname and port are localhost and 5185.) You can also
|
|
|
-pass in a specific hostname and/or port to the connect() call. Upon
|
|
|
-successful connection and handshake with the server, the PStatClient
|
|
|
-sends a list of the available collectors, along with their names,
|
|
|
-colors, and hierarchical relationships, on the TCP channel.
|
|
|
-
|
|
|
-Once connected, each call to start() and stop() adds a collector
|
|
|
-number and timestamp to an array maintained by the PStatClient. At
|
|
|
-the end of each frame, the PStatClient boils this array into a
|
|
|
-datagram for shipping to the server. Each start() and stop() event
|
|
|
-requires 6 bytes; if the resulting datagram will fit within a UDP
|
|
|
-packet (1K bytes, or about 84 start/stop pairs), it is sent via UDP;
|
|
|
-otherwise, it is sent on the TCP channel. (Some fraction of the
|
|
|
-packets that are eligible for UDP, from 0% to 100%, may be sent via
|
|
|
-TCP instead; you can specify this with the pstats-tcp-ratio Config.prc
|
|
|
-variable.)
|
|
|
-
|
|
|
-Also, to prevent flooding the network and/or overwhelming the PStats
|
|
|
-server, only so many frames of data will be sent per second. This
|
|
|
-parameter is controlled by the pstats-max-rate Config.prc variable and
|
|
|
-is set to 30 by default. (If the packets are larger than 1K, the max
|
|
|
-transmission rate is also automatically reduced further in
|
|
|
-proportion.) If the frame rate is higher than this limit, some frames
|
|
|
-will simply not be transmitted. The server is designed to cope with
|
|
|
-missing frames and will assume missing frames are similar to their
|
|
|
-neighbors.
|
|
|
-
|
|
|
-The server does all the work of analyzing the data after that. The
|
|
|
-client's next job is simply to clear its array and prepare itself for
|
|
|
-the next frame.
|
|
|
-
|
|
|
-
|
|
|
-The PStats Server
|
|
|
-
|
|
|
-The generic server code is in pandatool/src/pstatserver, and the
|
|
|
-GUI-specific server code is in pandatool/src/gtk-stats and
|
|
|
-pandatool/src/win-stats, for Unix and Windows, respectively. (There
|
|
|
-is also an OS-independent text-stats subdirectory, which builds a
|
|
|
-trivial PStats server that presents a scrolling-text interface. This
|
|
|
-is mainly useful as a proof of technology rather than as a usable
|
|
|
-tool.)
|
|
|
-
|
|
|
-The GUI-specific code is the part that manages the interaction with
|
|
|
-the user via the creation of windows and the handling of mouse input,
|
|
|
-etc.; most of the real work of interpreting the data is done in the
|
|
|
-generic code in the pstatserver directory.
|
|
|
-
|
|
|
-The PStatServer owns all of the connections, and interfaces with the
|
|
|
-NSPR library to communicate with the clients. It listens on the
|
|
|
-specified port for new connections, using the pstats-port Config.prc
|
|
|
-variable to determine the port number (this is the same variable that
|
|
|
-specifies the port to the client). Usually you can leave this at its
|
|
|
-default value of 5185, but there may be some cases in which that port
|
|
|
-is already in use on a particular machine (for instance, maybe someone
|
|
|
-else is running another PStats server on another display of the same
|
|
|
-machine).
|
|
|
-
|
|
|
-Once a connection is received, it creates a PStatMonitor class (this
|
|
|
-class is specialized for each of the different GUI variants) that
|
|
|
-handles all the data for this particular connection. In the case of
|
|
|
-the windows pstats.exe program, each new monitor instance is
|
|
|
-represented by a new toplevel window. Multiple monitors can be
|
|
|
-active at once.
|
|
|
-
|
|
|
-The work of digesting the data from the client is performed by the
|
|
|
-PStatView class, which analyzes the pattern of start and stop
|
|
|
-timestamps, along with the relationship data of the various
|
|
|
-collectors, and boils it down into a list of the amount of time spent
|
|
|
-in each collector per frame.
|
|
|
-
|
|
|
-Finally, a PStatStripChart or PStatPianoRoll class object defines the
|
|
|
-actual graph output of colored lines and bars; the generic versions of
|
|
|
-these include virtual functions to do the actual drawing (the GUI
|
|
|
-specializations of these redefine these methods to make the
|
|
|
-appropriate calls).
|
|
|
-
|