22 years ago · c4f3f0b8e4
--- a/panda/src/doc/howto.use_pstats
+++ b/panda/src/doc/howto.use_pstats
@@ -0,0 +1,397 @@
 
															+QUICK INTRODUCTION
														
 
															+
														
 
															+PStats is Panda's built-in performance analysis tool.  It can graph
														
 
															+frame rate over time, and can further graph the work spent within each
														
 
															+frame into user-defined subdivisions of the frame (for instance, app,
														
 
															+cull and draw), and thus can be an invaluable tool in identifying
														
 
															+performance bottlenecks.  It can also show frame-based data that
														
 
															+reflects any arbitrary quantity other than time intervals, for
														
 
															+instance, texture memory in use or number of vertices drawn.
														
 
															+
														
 
															+The performance graphs may be drawn on the same computer that is
														
 
															+running the Panda client, or they may be drawn on another computer on
														
 
															+the same LAN, which is useful for analyzing fullscreen applications.
														
 
															+The remote computer need not be running the same operating system as
														
 
															+the client computer.
														
 
															+
														
 
															+To use PStats, you first need to build the PStats server program,
														
 
															+which is part of the Pandatool tree (it's called pstats.exe on
														
 
															+Windows, and gtk-stats on a Unix platform).  Start by running the
														
 
															+PStats server program (it runs in the background), and then start your
														
 
															+Direct/Panda client with the following in your Configrc file:
														
 
															+
														
 
															+  want-pstats 1
														
 
															+
														
 
															+Or, at runtime, issue the Python command:
														
 
															+
														
 
															+  PStatClient.connect()
														
 
															+
														
 
															+Or if you're running pview, press shift-S.
														
 
															+
														
 
															+Any of the above will contact your running PStats server program,
														
 
															+which will proceed to open a window and start a running graph of your
														
 
															+client's performance.  If you are running the server on a different
														
 
															+machine than the client, add the pstats-host variable to your client's
														
 
															+Configrc file, naming the hostname or IP address of the machine
														
 
															+running the PStats server.
														
 
															+
														
 
															+If you are developing Python code, you may be interested in reporting
														
 
															+the relative time spent within each Python task (by subdividing the
														
 
															+total time spent in Python, as reported under "Show Code").  To do
														
 
															+this, add the following lines to your Configrc file before you start
														
 
															+ShowBase:
														
 
															+
														
 
															+  task-timer-verbose 1
														
 
															+  pstats-tasks 1
														
 
															+
														
 
															+
														
 
															+THE PSTATS SERVER (The user interface)
														
 
															+
														
 
															+The GUI for managing the graphs and drilling down to view more detail
														
 
															+is entirely controlled by the PStats server program.  At the time of
														
 
															+this writing, there are two different versions of the PStats server,
														
 
															+one for Unix called gtk-stats and one for Windows called simply
														
 
															+pstats.  The interfaces are similar but not identical; the following
														
 
															+paragraphs describe the Windows version.
														
 
															+
														
 
															+When you run pstats.exe, it adds a program to the taskbar but does not
														
 
															+immediately open a window.  The program name is typically "PStats
														
 
															+5180", showing the default PStats TCP port number of 5180; see "HOW IT
														
 
															+WORKS" below for more details about the TCP communication system.  For
														
 
															+the most part you don't need to worry about the port number, as long
														
 
															+as server and client agree.
														
 
															+
														
 
															+Each time a client connects to the PStats server, a new monitor window
														
 
															+is created.  This monitor window owns all of the graphs that you
														
 
															+create to view the performance data from that particular connection.
														
 
															+Initially, a strip chart showing the frame time of the main thread is
														
 
															+created by default; you can create additional graphs by selecting from
														
 
															+the Graphs pulldown menu.
														
 
															+
														
 
															+Time-based Strip Charts
														
 
															+
														
 
															+This is the graph type you will use most frequently to examine
														
 
															+performance data.  The horizontal axis represents the passage of
														
 
															+frames; each subsequent frame is represented as a vertical slice on
														
 
															+the graph.  The overall height of the colored bands represents the
														
 
															+total amount of time spent on each frame; within the frame, the time
														
 
															+is further divided into the primary subdivisions represented by
														
 
															+different color bands (and labeled on the left).  These subdivisions
														
 
															+are called "collectors" in the PStats terminology, since they
														
 
															+represent time collected by different tasks.
														
 
															+
														
 
															+Normally, the three primary collectors are App, Cull, and Draw, the
														
 
															+three stages of the graphics pipeline.  Atop these three colored
														
 
															+collectors is the label "Frame", which represents any remaining time
														
 
															+spent in the frame that was not specifically allocated to one of the
														
 
															+three child collectors (normally, there should not be significant time
														
 
															+reported here).
														
 
															+
														
 
															+The frame time in milliseconds, averaged over the past three seconds,
														
 
															+is drawn above the upper right corner of the graph.  The labels on the
														
 
															+guide bars on the right are also shown in milliseconds; if you prefer
														
 
															+to think about a target frame rate rather than an elapsed time in
														
 
															+milliseconds, you may find it useful to select "Hz" from the Units
														
 
															+pulldown menu, which changes the time units accordingly.
														
 
															+
														
 
															+The running Panda client suggests its target frame rate, as well as
														
 
															+the initial vertical scale of the graph (that is, the height of the
														
 
															+colored bars).  You can change the scale freely by clicking within the
														
 
															+graph itself and dragging the mouse up or down as necessary.  One of
														
 
															+the horizontal guide bars is drawn in a lighter shade of gray; this
														
 
															+one represents the actual target frame rate suggested by the client.
														
 
															+The other, darker, guide bars are drawn automatically at harmonic
														
 
															+subdvisions of the target frame rate.  You can change the target frame
														
 
															+rate with the Configrc variable pstats-target-frame-rate on the
														
 
															+client.
														
 
															+
														
 
															+You can also create any number of user-defined guide bars by dragging
														
 
															+them into the graph from the gray space immediately above or below the
														
 
															+graph.  These are drawn in a dashed blue line.  It is sometimes useful
														
 
															+to place one of these to mark a performance level so it may be
														
 
															+compared to future values (or to alternate configurations).
														
 
															+
														
 
															+The primary collectors labeled on the left might themselves be further
														
 
															+subdivided, if the data is provided by the client.  For instance, App
														
 
															+is often divided into Show Code, Animation, and Collisions, where Show
														
 
															+Code is the time spent executing any Python code, Animation is the
														
 
															+time used to compute any animated characters, and Collisions is the
														
 
															+time spent in the collision traverser(s).
														
 
															+
														
 
															+To see any of these further breakdowns, double-click on the
														
 
															+corresponding colored label (or on the colored band within the graph
														
 
															+itself).  This narrows the focus of the strip chart from the overall
														
 
															+frame to just the selected collector, which has two advantages.
														
 
															+Firstly, it may be easier to observe the behavior of one particular
														
 
															+collector when it is drawn alone (as opposed to being stacked on top
														
 
															+of some other color bars), and the time in the upper-right corner will
														
 
															+now reflect just the total time spent within just this collector.
														
 
															+Secondly, if there are further breakdowns to this collector, they will
														
 
															+now be shown as further colored bars.  As in the Frame chart, the
														
 
															+topmost label is the name of the parent collector, and any time shown
														
 
															+in this color represents time allocated to the parent collector that
														
 
															+is not accounted for by any of the child collectors.
														
 
															+
														
 
															+You can further drill down by double-clicking on any of the new
														
 
															+labels; or double-click on the top label, or the white part of the
														
 
															+graph, to return back up to the previous level.
														
 
															+
														
 
															+Value-based Strip Charts
														
 
															+
														
 
															+There are other strip charts you may create, which show arbitrary
														
 
															+kinds of data per frame other than elapsed time.  These can only be
														
 
															+accessed from the Graphs pulldown menu, and include things such as
														
 
															+texture memory in use and vertices drawn.  They behave similarly to
														
 
															+the time-based strip charts described above.
														
 
															+
														
 
															+Piano Roll Charts
														
 
															+
														
 
															+This graph is used less frequently, but when it is needed it is a
														
 
															+valuable tool to reveal exactly how the time is spent within a frame.
														
 
															+The PStats server automatically collects together all the time spent
														
 
															+within each collector and shows it as a single total, but in reality
														
 
															+it may not all have been spent in one continuous block of time.
														
 
															+
														
 
															+For instance, when Panda draws each display region in single-threaded
														
 
															+mode, it performs a cull traversal followed by a draw traversal for
														
 
															+each display region.  Thus, if your Panda client includes multiple
														
 
															+display regions, it will alternate its time spent culling and drawing
														
 
															+as it processes each of them.  The strip chart, however, reports only
														
 
															+the total cull time and draw time spent.
														
 
															+
														
 
															+Sometimes you really need to know the sequence of events in the frame,
														
 
															+not just the total time spent in each collector.  The piano roll chart
														
 
															+shows this kind of data.  It is so named because it is similar to the
														
 
															+paper music roll for an old-style player piano, with holes punched
														
 
															+down the roll for each note that is to be played.  The longer the
														
 
															+hole, the longer the piano key is held down.  (Think of the chart as
														
 
															+rotated 90 degrees from an actual piano roll.  A player piano roll
														
 
															+plays from bottom to top; the piano roll chart reads from left to
														
 
															+right.)
														
 
															+
														
 
															+Unlike a strip chart, a piano roll chart does not show trends; the
														
 
															+chart shows only the current frame's data.  The horizontal axis shows
														
 
															+time within the frame, and the individual collectors are stacked up in
														
 
															+an arbitrary ordering along the vertical axis.
														
 
															+
														
 
															+The time spent within the frame is drawn from left to right; at any
														
 
															+given time, the collector(s) that are active will be drawn with a
														
 
															+horizontal bar.  You can observe the CPU behavior within a frame by
														
 
															+reading the graph from left to right.  You may find it useful to
														
 
															+select "pause" from the Speed pulldown menu to freeze the graph on
														
 
															+just one frame while you read it.
														
 
															+
														
 
															+Note that the piano roll chart shows time spent within the frame on
														
 
															+the horizontal axis, instead of the vertical axis, as it is on the
														
 
															+strip charts.  Thus, the guide bars on the piano roll chart are
														
 
															+vertical lines instead of horizontal lines, and they may be dragged in
														
 
															+from the left or the right sides (instead of from the top or bottom,
														
 
															+as on the strip charts).  Apart from this detail, these are the same
														
 
															+guide bars that appear on the strip charts.
														
 
															+
														
 
															+The piano roll chart may be created from the Graphs pulldown menu.
														
 
															+
														
 
															+Additional threads
														
 
															+
														
 
															+If the panda client has multiple threads that generate PStats data,
														
 
															+the PStats server can open up graphs for these threads as well.  Each
														
 
															+separate thread is considered unrelated to the main thread, and may
														
 
															+have the same or an independent frame rate.  Each separate thread will
														
 
															+be given its own pulldown menu to create graphs associated with that
														
 
															+thread; these auxiliary thread menus will appear on the menu bar
														
 
															+following the Graphs menu.
														
 
															+
														
 
															+
														
 
															+HOW TO DEFINE YOUR OWN COLLECTORS
														
 
															+
														
 
															+The PStats client code is designed to be generic enough to allow users
														
 
															+to define their own collectors to time any arbitrary blocks of code
														
 
															+(or record additional non-time-based data), from either the C++ or the
														
 
															+Python level.
														
 
															+
														
 
															+The general idea is to create a PStatCollector for each separate block
														
 
															+of code you wish to time.  The name which is passed to the
														
 
															+PStatCollector constructor is a unique identifier: all collectors that
														
 
															+share the same name are deemed to be the same collector.
														
 
															+
														
 
															+Furthermore, the collector's name can be used to define the
														
 
															+hierarchical relationship of each collector with other existing
														
 
															+collectors.  To do this, prefix the collector's name with the name of
														
 
															+its parent(s), followed by a colon separator.  For instance,
														
 
															+PStatCollector("Draw:Flip") defines a collector named "Flip", which is
														
 
															+a child of the "Draw" collector, defined elsewhere.
														
 
															+
														
 
															+You can also define a collector as a child of another collector by
														
 
															+giving the parent collector explicitly followed by the name of the
														
 
															+child collector alone, which is handy for dynamically-defined
														
 
															+collectors.  For instance, PStatCollector(draw, "Flip") defines the
														
 
															+same collector named above, assuming that draw is the result of the
														
 
															+PStatCollector("Draw") constructor.
														
 
															+
														
 
															+Note that, because of an unfortunate limitation with the interrogate
														
 
															+parser, statically-defined PStatCollector objects can't be parsed by
														
 
															+interrogate.  (In general, interrogate can't parse C++ objects that
														
 
															+are constructed with parameters at the outermost scoping level.)  As a
														
 
															+workaround, we usually protect these declarations from interrogate by
														
 
															+using the syntax #ifndef CPPPARSER .. #endif.
														
 
															+
														
 
															+Once you have a collector, simply bracket the region of code you wish
														
 
															+to time with collector.start() and collector.stop().  It is important
														
 
															+to ensure that each call to start() is matched by exactly one call to
														
 
															+stop().  If you are programming in C++, it is highly recommended that
														
 
															+you use the PStatTimer class to make these calls automatically, which
														
 
															+guarantees the correct pairing; the PStatTimer's constructor calls
														
 
															+start() and its destructor calls stop(), so you may simply define a
														
 
															+PStatTimer object at the beginning of the block of code you wish to
														
 
															+time.  If you are programming in Python, you must call start() and
														
 
															+stop() explicitly.
														
 
															+
														
 
															+When you call start() and there was another collector already started,
														
 
															+that previous collector is paused until you call the matching stop()
														
 
															+(at which time the previous collector is resumed).  That is, time is
														
 
															+accumulated only towards the collector indicated by the innermost
														
 
															+start() .. stop() pair.
														
 
															+
														
 
															+Time accumulated towards any collector is also counted towards that
														
 
															+collector's parent, as defined in the collector's constructor
														
 
															+(described above).
														
 
															+
														
 
															+It is important to understand the difference between collectors nested
														
 
															+implicitly by runtime start/stop invocations, and the static hierarchy
														
 
															+implicit in the collector definition.  Time is accumulated in parent
														
 
															+collectors according to the statically-defined parents of the
														
 
															+innermost active collector only, without regard to the runtime stack
														
 
															+of paused collectors.
														
 
															+
														
 
															+For example, suppose you are in the middle of processing the "Draw"
														
 
															+task and have therefore called start() on the "Draw" collector.  While
														
 
															+in the middle of processing this block of code, you call a function
														
 
															+that has its own collector called "Cull:Sort".  As soon as you start
														
 
															+the new collector, you have paused the "Draw" collector and are now
														
 
															+accumulating time in the "Cull:Sort" collector.  Once this new
														
 
															+collector stops, you will automatically return to accumulating time in
														
 
															+the "Draw" collector.  The time spent within the nested "Cull:Sort"
														
 
															+collector will be counted towards the "Cull" total time, not the
														
 
															+"Draw" total time.
														
 
															+
														
 
															+Color and Other Optional Collector Properties
														
 
															+
														
 
															+If you do not specify a color for a particular collector, it will be
														
 
															+assigned a random color at runtime.  At present, the only way to
														
 
															+specify a color is to modify
														
 
															+panda/src/pstatclient/pStatProperties.cxx, and add a line to the table
														
 
															+for your new collector(s).  You can also define additional properties
														
 
															+here such as a suggested initial scale for the graph and, for
														
 
															+non-time-based collectors, a unit name and/or scale factor.  The order
														
 
															+in which these collectors are listed in this table is also relevant;
														
 
															+they will appear in the same order on the graphs.  The first column
														
 
															+should be set to 1 for your new collectors unless you wish them to be
														
 
															+disabled by default.  You must recompile the client (but not the
														
 
															+server) to reflect changes to this table.
														
 
															+
														
 
															+
														
 
															+HOW IT WORKS (What's actually happening)
														
 
															+
														
 
															+The PStats code is divided into two main parts: the client code and
														
 
															+the server code.
														
 
															+
														
 
															+The PStats Client
														
 
															+
														
 
															+The client code is in panda/src/pstatclient, and is available to run
														
 
															+in every Panda client unless it is compiled out.  (It will be compiled
														
 
															+out if OPTIMIZE is set to level 4, unless DO_PSTATS is also explicitly
														
 
															+set to non-empty.  It will also be compiled out if NSPR is not
														
 
															+available, since both client and server depend on the NSPR library to
														
 
															+exchange data, even when running the server on the same machine as the
														
 
															+client.)
														
 
															+
														
 
															+The client code is designed for minimal runtime overhead when it is
														
 
															+compiled in but not enabled (that is, when the client is not in
														
 
															+contact with a PStats server), as well as when it is enabled (when the
														
 
															+client is in contact with a PStats server).  It is also designed for
														
 
															+zero runtime overhead when it is compiled out.
														
 
															+
														
 
															+There is one global PStatClient class object, which manages all of the
														
 
															+communications on the client side.  Each PStatCollector is simply an
														
 
															+index into an array stored within the PStatClient object, although the
														
 
															+interface is intended to hide this detail from the programmer.
														
 
															+
														
 
															+Initially, before the PStatClient has established a connection, calls
														
 
															+to start() and stop() simply return immediately.
														
 
															+
														
 
															+When you call PStatClient.connect(), the client attempts to contact
														
 
															+the PStatServer via a TCP connection to the hostname and port named in
														
 
															+the pstats-host and pstats-port Configrc variables, respectively.
														
 
															+(The default hostname and port are localhost and 5180.)  You can also
														
 
															+pass in a specific hostname and/or port to the connect() call.  Upon
														
 
															+successful connection and handshake with the server, the PStatClient
														
 
															+sends a list of the available collectors, along with their names,
														
 
															+colors, and hierarchical relationships, on the TCP channel.
														
 
															+
														
 
															+Once connected, each call to start() and stop() adds a collector
														
 
															+number and timestamp to an array maintained by the PStatClient.  At
														
 
															+the end of each frame, the PStatClient boils this array into a
														
 
															+datagram for shipping to the server.  Each start() and stop() event
														
 
															+requires 6 bytes; if the resulting datagram will fit within a UDP
														
 
															+packet (1K bytes, or about 84 start/stop pairs), it is sent via UDP;
														
 
															+otherwise, it is sent on the TCP channel.
														
 
															+
														
 
															+Also, to prevent flooding the network and/or overwhelming the PStats
														
 
															+server, only so many frames of data will be sent per second.  This
														
 
															+parameter is controlled by the pstats-max-rate Configrc variable and
														
 
															+is set to 30 by default.  (If the packets are larger than 1K, the max
														
 
															+transmission rate is also automatically reduced further in
														
 
															+proportion.)  If the frame rate is higher than this limit, some frames
														
 
															+will simply not be transmitted.  The server is designed to cope with
														
 
															+missing frames and will assume missing frames are similar to their
														
 
															+neighbors.
														
 
															+
														
 
															+The server does all the work of analyzing the data after that.  The
														
 
															+client's next job is simply to clear its array and prepare itself for
														
 
															+the next frame.
														
 
															+
														
 
															+
														
 
															+The PStats Server
														
 
															+
														
 
															+The generic server code is in pandatool/src/pstatserver, and the
														
 
															+GUI-specific server code is in pandatool/src/gtk-stats and
														
 
															+pandatool/src/win-stats, for Unix and Windows, respectively.  (There
														
 
															+is also an OS-independent text-stats subdirectory, which builds a
														
 
															+trivial PStats server that presents a scrolling-text interface.  This
														
 
															+is mainly useful as a proof of technology rather than as a usable
														
 
															+tool.)
														
 
															+
														
 
															+The GUI-specific code is the part that manages the interaction with
														
 
															+the user via the creation of windows and the handling of mouse input,
														
 
															+etc.; most of the real work of interpreting the data is done in the
														
 
															+generic code in the pstatserver directory.
														
 
															+
														
 
															+The PStatServer owns all of the connections, and interfaces with the
														
 
															+NSPR library to communicate with the clients.  It listens on the
														
 
															+specified port for new connections, using the pstats-port Configrc
														
 
															+variable to determine the port number (this is the same variable that
														
 
															+specifies the port to the client).  Usually you can leave this at its
														
 
															+default value of 5180, but there may be some cases in which that port
														
 
															+is already in use on a particular machine (for instance, maybe someone
														
 
															+else is running another PStats server on another display of the same
														
 
															+machine).
														
 
															+
														
 
															+Once a connection is received, it creates a PStatMonitor class (this
														
 
															+class is specialized for each of the different GUI variants) that
														
 
															+handles all the data for this particular connection.  In the case of
														
 
															+the windows pstats.exe program, each new monitor instance is
														
 
															+represented by a new toplevel window.  Multiple monitors can be
														
 
															+active at once.
														
 
															+
														
 
															+The work of digesting the data from the client is performed by the
														
 
															+PStatView class, which analyzes the pattern of start and stop
														
 
															+timestamps, along with the relationship data of the various
														
 
															+collectors, and boils it down into a list of the amount of time spent
														
 
															+in each collector per frame.
														
 
															+
														
 
															+Finally, a PStatStripChart or PStatPianoRoll class object defines the
														
 
															+actual graph output of colored lines and bars; the generic versions of
														
 
															+these include virtual functions to do the actual drawing (the GUI
														
 
															+specializations of these redefine these methods to make the
														
 
															+appropriate calls).
														
 
															+