Browse Source

add new document

David Rose 22 years ago
parent
commit
c4f3f0b8e4
1 changed files with 397 additions and 0 deletions
  1. 397 0
      panda/src/doc/howto.use_pstats

+ 397 - 0
panda/src/doc/howto.use_pstats

@@ -0,0 +1,397 @@
+QUICK INTRODUCTION
+
+PStats is Panda's built-in performance analysis tool.  It can graph
+frame rate over time, and can further graph the work spent within each
+frame into user-defined subdivisions of the frame (for instance, app,
+cull and draw), and thus can be an invaluable tool in identifying
+performance bottlenecks.  It can also show frame-based data that
+reflects any arbitrary quantity other than time intervals, for
+instance, texture memory in use or number of vertices drawn.
+
+The performance graphs may be drawn on the same computer that is
+running the Panda client, or they may be drawn on another computer on
+the same LAN, which is useful for analyzing fullscreen applications.
+The remote computer need not be running the same operating system as
+the client computer.
+
+To use PStats, you first need to build the PStats server program,
+which is part of the Pandatool tree (it's called pstats.exe on
+Windows, and gtk-stats on a Unix platform).  Start by running the
+PStats server program (it runs in the background), and then start your
+Direct/Panda client with the following in your Configrc file:
+
+  want-pstats 1
+
+Or, at runtime, issue the Python command:
+
+  PStatClient.connect()
+
+Or if you're running pview, press shift-S.
+
+Any of the above will contact your running PStats server program,
+which will proceed to open a window and start a running graph of your
+client's performance.  If you are running the server on a different
+machine than the client, add the pstats-host variable to your client's
+Configrc file, naming the hostname or IP address of the machine
+running the PStats server.
+
+If you are developing Python code, you may be interested in reporting
+the relative time spent within each Python task (by subdividing the
+total time spent in Python, as reported under "Show Code").  To do
+this, add the following lines to your Configrc file before you start
+ShowBase:
+
+  task-timer-verbose 1
+  pstats-tasks 1
+
+
+THE PSTATS SERVER (The user interface)
+
+The GUI for managing the graphs and drilling down to view more detail
+is entirely controlled by the PStats server program.  At the time of
+this writing, there are two different versions of the PStats server,
+one for Unix called gtk-stats and one for Windows called simply
+pstats.  The interfaces are similar but not identical; the following
+paragraphs describe the Windows version.
+
+When you run pstats.exe, it adds a program to the taskbar but does not
+immediately open a window.  The program name is typically "PStats
+5180", showing the default PStats TCP port number of 5180; see "HOW IT
+WORKS" below for more details about the TCP communication system.  For
+the most part you don't need to worry about the port number, as long
+as server and client agree.
+
+Each time a client connects to the PStats server, a new monitor window
+is created.  This monitor window owns all of the graphs that you
+create to view the performance data from that particular connection.
+Initially, a strip chart showing the frame time of the main thread is
+created by default; you can create additional graphs by selecting from
+the Graphs pulldown menu.
+
+Time-based Strip Charts
+
+This is the graph type you will use most frequently to examine
+performance data.  The horizontal axis represents the passage of
+frames; each subsequent frame is represented as a vertical slice on
+the graph.  The overall height of the colored bands represents the
+total amount of time spent on each frame; within the frame, the time
+is further divided into the primary subdivisions represented by
+different color bands (and labeled on the left).  These subdivisions
+are called "collectors" in the PStats terminology, since they
+represent time collected by different tasks.
+
+Normally, the three primary collectors are App, Cull, and Draw, the
+three stages of the graphics pipeline.  Atop these three colored
+collectors is the label "Frame", which represents any remaining time
+spent in the frame that was not specifically allocated to one of the
+three child collectors (normally, there should not be significant time
+reported here).
+
+The frame time in milliseconds, averaged over the past three seconds,
+is drawn above the upper right corner of the graph.  The labels on the
+guide bars on the right are also shown in milliseconds; if you prefer
+to think about a target frame rate rather than an elapsed time in
+milliseconds, you may find it useful to select "Hz" from the Units
+pulldown menu, which changes the time units accordingly.
+
+The running Panda client suggests its target frame rate, as well as
+the initial vertical scale of the graph (that is, the height of the
+colored bars).  You can change the scale freely by clicking within the
+graph itself and dragging the mouse up or down as necessary.  One of
+the horizontal guide bars is drawn in a lighter shade of gray; this
+one represents the actual target frame rate suggested by the client.
+The other, darker, guide bars are drawn automatically at harmonic
+subdvisions of the target frame rate.  You can change the target frame
+rate with the Configrc variable pstats-target-frame-rate on the
+client.
+
+You can also create any number of user-defined guide bars by dragging
+them into the graph from the gray space immediately above or below the
+graph.  These are drawn in a dashed blue line.  It is sometimes useful
+to place one of these to mark a performance level so it may be
+compared to future values (or to alternate configurations).
+
+The primary collectors labeled on the left might themselves be further
+subdivided, if the data is provided by the client.  For instance, App
+is often divided into Show Code, Animation, and Collisions, where Show
+Code is the time spent executing any Python code, Animation is the
+time used to compute any animated characters, and Collisions is the
+time spent in the collision traverser(s).
+
+To see any of these further breakdowns, double-click on the
+corresponding colored label (or on the colored band within the graph
+itself).  This narrows the focus of the strip chart from the overall
+frame to just the selected collector, which has two advantages.
+Firstly, it may be easier to observe the behavior of one particular
+collector when it is drawn alone (as opposed to being stacked on top
+of some other color bars), and the time in the upper-right corner will
+now reflect just the total time spent within just this collector.
+Secondly, if there are further breakdowns to this collector, they will
+now be shown as further colored bars.  As in the Frame chart, the
+topmost label is the name of the parent collector, and any time shown
+in this color represents time allocated to the parent collector that
+is not accounted for by any of the child collectors.
+
+You can further drill down by double-clicking on any of the new
+labels; or double-click on the top label, or the white part of the
+graph, to return back up to the previous level.
+
+Value-based Strip Charts
+
+There are other strip charts you may create, which show arbitrary
+kinds of data per frame other than elapsed time.  These can only be
+accessed from the Graphs pulldown menu, and include things such as
+texture memory in use and vertices drawn.  They behave similarly to
+the time-based strip charts described above.
+
+Piano Roll Charts
+
+This graph is used less frequently, but when it is needed it is a
+valuable tool to reveal exactly how the time is spent within a frame.
+The PStats server automatically collects together all the time spent
+within each collector and shows it as a single total, but in reality
+it may not all have been spent in one continuous block of time.
+
+For instance, when Panda draws each display region in single-threaded
+mode, it performs a cull traversal followed by a draw traversal for
+each display region.  Thus, if your Panda client includes multiple
+display regions, it will alternate its time spent culling and drawing
+as it processes each of them.  The strip chart, however, reports only
+the total cull time and draw time spent.
+
+Sometimes you really need to know the sequence of events in the frame,
+not just the total time spent in each collector.  The piano roll chart
+shows this kind of data.  It is so named because it is similar to the
+paper music roll for an old-style player piano, with holes punched
+down the roll for each note that is to be played.  The longer the
+hole, the longer the piano key is held down.  (Think of the chart as
+rotated 90 degrees from an actual piano roll.  A player piano roll
+plays from bottom to top; the piano roll chart reads from left to
+right.)
+
+Unlike a strip chart, a piano roll chart does not show trends; the
+chart shows only the current frame's data.  The horizontal axis shows
+time within the frame, and the individual collectors are stacked up in
+an arbitrary ordering along the vertical axis.
+
+The time spent within the frame is drawn from left to right; at any
+given time, the collector(s) that are active will be drawn with a
+horizontal bar.  You can observe the CPU behavior within a frame by
+reading the graph from left to right.  You may find it useful to
+select "pause" from the Speed pulldown menu to freeze the graph on
+just one frame while you read it.
+
+Note that the piano roll chart shows time spent within the frame on
+the horizontal axis, instead of the vertical axis, as it is on the
+strip charts.  Thus, the guide bars on the piano roll chart are
+vertical lines instead of horizontal lines, and they may be dragged in
+from the left or the right sides (instead of from the top or bottom,
+as on the strip charts).  Apart from this detail, these are the same
+guide bars that appear on the strip charts.
+
+The piano roll chart may be created from the Graphs pulldown menu.
+
+Additional threads
+
+If the panda client has multiple threads that generate PStats data,
+the PStats server can open up graphs for these threads as well.  Each
+separate thread is considered unrelated to the main thread, and may
+have the same or an independent frame rate.  Each separate thread will
+be given its own pulldown menu to create graphs associated with that
+thread; these auxiliary thread menus will appear on the menu bar
+following the Graphs menu.
+
+
+HOW TO DEFINE YOUR OWN COLLECTORS
+
+The PStats client code is designed to be generic enough to allow users
+to define their own collectors to time any arbitrary blocks of code
+(or record additional non-time-based data), from either the C++ or the
+Python level.
+
+The general idea is to create a PStatCollector for each separate block
+of code you wish to time.  The name which is passed to the
+PStatCollector constructor is a unique identifier: all collectors that
+share the same name are deemed to be the same collector.
+
+Furthermore, the collector's name can be used to define the
+hierarchical relationship of each collector with other existing
+collectors.  To do this, prefix the collector's name with the name of
+its parent(s), followed by a colon separator.  For instance,
+PStatCollector("Draw:Flip") defines a collector named "Flip", which is
+a child of the "Draw" collector, defined elsewhere.
+
+You can also define a collector as a child of another collector by
+giving the parent collector explicitly followed by the name of the
+child collector alone, which is handy for dynamically-defined
+collectors.  For instance, PStatCollector(draw, "Flip") defines the
+same collector named above, assuming that draw is the result of the
+PStatCollector("Draw") constructor.
+
+Note that, because of an unfortunate limitation with the interrogate
+parser, statically-defined PStatCollector objects can't be parsed by
+interrogate.  (In general, interrogate can't parse C++ objects that
+are constructed with parameters at the outermost scoping level.)  As a
+workaround, we usually protect these declarations from interrogate by
+using the syntax #ifndef CPPPARSER .. #endif.
+
+Once you have a collector, simply bracket the region of code you wish
+to time with collector.start() and collector.stop().  It is important
+to ensure that each call to start() is matched by exactly one call to
+stop().  If you are programming in C++, it is highly recommended that
+you use the PStatTimer class to make these calls automatically, which
+guarantees the correct pairing; the PStatTimer's constructor calls
+start() and its destructor calls stop(), so you may simply define a
+PStatTimer object at the beginning of the block of code you wish to
+time.  If you are programming in Python, you must call start() and
+stop() explicitly.
+
+When you call start() and there was another collector already started,
+that previous collector is paused until you call the matching stop()
+(at which time the previous collector is resumed).  That is, time is
+accumulated only towards the collector indicated by the innermost
+start() .. stop() pair.
+
+Time accumulated towards any collector is also counted towards that
+collector's parent, as defined in the collector's constructor
+(described above).
+
+It is important to understand the difference between collectors nested
+implicitly by runtime start/stop invocations, and the static hierarchy
+implicit in the collector definition.  Time is accumulated in parent
+collectors according to the statically-defined parents of the
+innermost active collector only, without regard to the runtime stack
+of paused collectors.
+
+For example, suppose you are in the middle of processing the "Draw"
+task and have therefore called start() on the "Draw" collector.  While
+in the middle of processing this block of code, you call a function
+that has its own collector called "Cull:Sort".  As soon as you start
+the new collector, you have paused the "Draw" collector and are now
+accumulating time in the "Cull:Sort" collector.  Once this new
+collector stops, you will automatically return to accumulating time in
+the "Draw" collector.  The time spent within the nested "Cull:Sort"
+collector will be counted towards the "Cull" total time, not the
+"Draw" total time.
+
+Color and Other Optional Collector Properties
+
+If you do not specify a color for a particular collector, it will be
+assigned a random color at runtime.  At present, the only way to
+specify a color is to modify
+panda/src/pstatclient/pStatProperties.cxx, and add a line to the table
+for your new collector(s).  You can also define additional properties
+here such as a suggested initial scale for the graph and, for
+non-time-based collectors, a unit name and/or scale factor.  The order
+in which these collectors are listed in this table is also relevant;
+they will appear in the same order on the graphs.  The first column
+should be set to 1 for your new collectors unless you wish them to be
+disabled by default.  You must recompile the client (but not the
+server) to reflect changes to this table.
+
+
+HOW IT WORKS (What's actually happening)
+
+The PStats code is divided into two main parts: the client code and
+the server code.
+
+The PStats Client
+
+The client code is in panda/src/pstatclient, and is available to run
+in every Panda client unless it is compiled out.  (It will be compiled
+out if OPTIMIZE is set to level 4, unless DO_PSTATS is also explicitly
+set to non-empty.  It will also be compiled out if NSPR is not
+available, since both client and server depend on the NSPR library to
+exchange data, even when running the server on the same machine as the
+client.)
+
+The client code is designed for minimal runtime overhead when it is
+compiled in but not enabled (that is, when the client is not in
+contact with a PStats server), as well as when it is enabled (when the
+client is in contact with a PStats server).  It is also designed for
+zero runtime overhead when it is compiled out.
+
+There is one global PStatClient class object, which manages all of the
+communications on the client side.  Each PStatCollector is simply an
+index into an array stored within the PStatClient object, although the
+interface is intended to hide this detail from the programmer.
+
+Initially, before the PStatClient has established a connection, calls
+to start() and stop() simply return immediately.
+
+When you call PStatClient.connect(), the client attempts to contact
+the PStatServer via a TCP connection to the hostname and port named in
+the pstats-host and pstats-port Configrc variables, respectively.
+(The default hostname and port are localhost and 5180.)  You can also
+pass in a specific hostname and/or port to the connect() call.  Upon
+successful connection and handshake with the server, the PStatClient
+sends a list of the available collectors, along with their names,
+colors, and hierarchical relationships, on the TCP channel.
+
+Once connected, each call to start() and stop() adds a collector
+number and timestamp to an array maintained by the PStatClient.  At
+the end of each frame, the PStatClient boils this array into a
+datagram for shipping to the server.  Each start() and stop() event
+requires 6 bytes; if the resulting datagram will fit within a UDP
+packet (1K bytes, or about 84 start/stop pairs), it is sent via UDP;
+otherwise, it is sent on the TCP channel.
+
+Also, to prevent flooding the network and/or overwhelming the PStats
+server, only so many frames of data will be sent per second.  This
+parameter is controlled by the pstats-max-rate Configrc variable and
+is set to 30 by default.  (If the packets are larger than 1K, the max
+transmission rate is also automatically reduced further in
+proportion.)  If the frame rate is higher than this limit, some frames
+will simply not be transmitted.  The server is designed to cope with
+missing frames and will assume missing frames are similar to their
+neighbors.
+
+The server does all the work of analyzing the data after that.  The
+client's next job is simply to clear its array and prepare itself for
+the next frame.
+
+
+The PStats Server
+
+The generic server code is in pandatool/src/pstatserver, and the
+GUI-specific server code is in pandatool/src/gtk-stats and
+pandatool/src/win-stats, for Unix and Windows, respectively.  (There
+is also an OS-independent text-stats subdirectory, which builds a
+trivial PStats server that presents a scrolling-text interface.  This
+is mainly useful as a proof of technology rather than as a usable
+tool.)
+
+The GUI-specific code is the part that manages the interaction with
+the user via the creation of windows and the handling of mouse input,
+etc.; most of the real work of interpreting the data is done in the
+generic code in the pstatserver directory.
+
+The PStatServer owns all of the connections, and interfaces with the
+NSPR library to communicate with the clients.  It listens on the
+specified port for new connections, using the pstats-port Configrc
+variable to determine the port number (this is the same variable that
+specifies the port to the client).  Usually you can leave this at its
+default value of 5180, but there may be some cases in which that port
+is already in use on a particular machine (for instance, maybe someone
+else is running another PStats server on another display of the same
+machine).
+
+Once a connection is received, it creates a PStatMonitor class (this
+class is specialized for each of the different GUI variants) that
+handles all the data for this particular connection.  In the case of
+the windows pstats.exe program, each new monitor instance is
+represented by a new toplevel window.  Multiple monitors can be
+active at once.
+
+The work of digesting the data from the client is performed by the
+PStatView class, which analyzes the pattern of start and stop
+timestamps, along with the relationship data of the various
+collectors, and boils it down into a list of the amount of time spent
+in each collector per frame.
+
+Finally, a PStatStripChart or PStatPianoRoll class object defines the
+actual graph output of colored lines and bars; the generic versions of
+these include virtual functions to do the actual drawing (the GUI
+specializations of these redefine these methods to make the
+appropriate calls).
+