general_optimization.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291
  1. .. _doc_general_optimization:
  2. General optimization tips
  3. =========================
  4. Introduction
  5. ~~~~~~~~~~~~
  6. In an ideal world, computers would run at infinite speed, and the only limit to
  7. what we could achieve would be our imagination. In the real world, however, it
  8. is all too easy to produce software that will bring even the fastest computer to
  9. its knees.
  10. Designing games and other software is thus a compromise between what we would
  11. like to be possible, and what we can realistically achieve while maintaining
  12. good performance.
  13. To achieve the best results, we have two approaches:
  14. * Work faster
  15. * Work smarter
  16. And preferably, we will use a blend of the two.
  17. Smoke and Mirrors
  18. ^^^^^^^^^^^^^^^^^
  19. Part of working smarter is recognizing that, especially in games, we can often
  20. get the player to believe they are in a world that is far more complex,
  21. interactive, and graphically exciting than it really is. A good programmer is a
  22. magician, and should strive to learn the tricks of the trade, and try to invent
  23. new ones.
  24. The nature of slowness
  25. ^^^^^^^^^^^^^^^^^^^^^^
  26. To the outside observer, performance problems are often lumped together. But in
  27. reality, there are several different kinds of performance problem:
  28. * A slow process that occurs every frame, leading to a continuously low frame
  29. rate
  30. * An intermittent process that causes 'spikes' of slowness, leading to
  31. stalls
  32. * A slow process that occurs outside of normal gameplay, for instance, on
  33. level load
  34. Each of these are annoying to the user, but in different ways.
  35. Measuring performance
  36. =====================
  37. Probably the most important tool for optimization is the ability to measure
  38. performance - to identify where bottlenecks are, and to measure the success of
  39. our attempts to speed them up.
  40. There are several methods of measuring performance, including :
  41. * Putting a start / stop timer around code of interest
  42. * Using the Godot profiler
  43. * Using external third party profilers
  44. * Using GPU profilers / debuggers
  45. * Checking the frame rate (with vsync disabled)
  46. Be very aware that the relative performance of different areas can vary on
  47. different hardware. Often it is a good idea to make timings on more than one
  48. device, especially including mobile as well as desktop, if you are targeting
  49. mobile.
  50. Limitations
  51. ~~~~~~~~~~~
  52. CPU Profilers are often the 'go to' method for measuring performance, however
  53. they don't always tell the whole story.
  54. - Bottlenecks are often on the GPU, `as a result` of instructions given by the
  55. CPU
  56. - Spikes can occur in the Operating System processes (outside of Godot) `as a
  57. result` of instructions used in Godot (for example dynamic memory allocation)
  58. - You may not be able to profile e.g. a mobile phone
  59. - You may have to solve performance problems that occur on hardware you don't
  60. have access to
  61. As a result of these limitations, you often need to use detective work to find
  62. out where bottlenecks are.
  63. Detective work
  64. ~~~~~~~~~~~~~~
  65. Detective work is a crucial skill for developers (both in terms of performance,
  66. and also in terms of bug fixing). This can include hypothesis testing, and
  67. binary search.
  68. Hypothesis testing
  69. ^^^^^^^^^^^^^^^^^^
  70. Say for example you believe that sprites are slowing down your game. You can
  71. test this hypothesis for example by:
  72. * Measuring the performance when you add more sprites, or take some away.
  73. This may lead to a further hypothesis - does the size of the sprite determine
  74. the performance drop?
  75. * You can test this by keeping everything the same, but changing the sprite
  76. size, and measuring performance
  77. Binary search
  78. ^^^^^^^^^^^^^
  79. Say you know that frames are taking much longer than they should, but you are
  80. not sure where the bottleneck lies. You could begin by commenting out
  81. approximately half the routines that occur on a normal frame. Has the
  82. performance improved more or less than expected?
  83. Once you know which of the two halves contains the bottleneck, you can then
  84. repeat this process, until you have pinned down the problematic area.
  85. Profilers
  86. =========
  87. Profilers allow you to time your program while running it. Profilers then
  88. provide results telling you what percentage of time was spent in different
  89. functions and areas, and how often functions were called.
  90. This can be very useful both to identify bottlenecks and to measure the results
  91. of your improvements. Sometimes attempts to improve performance can backfire and
  92. lead to slower performance, so always use profiling and timing to guide your
  93. efforts.
  94. For more info about using the profiler within Godot see
  95. :ref:`doc_debugger_panel`.
  96. Principles
  97. ==========
  98. Donald Knuth:
  99. *Programmers waste enormous amounts of time thinking about, or worrying
  100. about, the speed of noncritical parts of their programs, and these attempts
  101. at efficiency actually have a strong negative impact when debugging and
  102. maintenance are considered. We should forget about small efficiencies, say
  103. about 97% of the time: premature optimization is the root of all evil. Yet
  104. we should not pass up our opportunities in that critical 3%.*
  105. The messages are very important:
  106. * Programmer / Developer time is limited. Instead of blindly trying to speed up
  107. all aspects of a program we should concentrate our efforts on the aspects that
  108. really matter.
  109. * Efforts at optimization often end up with code that is harder to read and
  110. debug than non-optimized code. It is in our interests to limit this to areas
  111. that will really benefit.
  112. Just because we `can` optimize a particular bit of code, it doesn't necessarily
  113. mean that we should. Knowing when, and when not to optimize is a great skill to
  114. develop.
  115. One misleading aspect of the quote is that people tend to focus on the subquote
  116. "premature optimization is the root of all evil". While `premature` optimization
  117. is (by definition) undesirable, performant software is the result of performant
  118. design.
  119. Performant design
  120. ~~~~~~~~~~~~~~~~~
  121. The danger with encouraging people to ignore optimization until necessary, is
  122. that it conveniently ignores that the most important time to consider
  123. performance is at the design stage, before a key has even hit a keyboard. If the
  124. design / algorithms of a program are inefficient, then no amount of polishing the
  125. details later will make it run fast. It may run `faster`, but it will never run
  126. as fast as a program designed for performance.
  127. This tends to be far more important in game / graphics programming than in
  128. general programming. A performant design, even without low level optimization,
  129. will often run many times faster than a mediocre design with low level
  130. optimization.
  131. Incremental design
  132. ~~~~~~~~~~~~~~~~~~
  133. Of course, in practice, unless you have prior knowledge, you are unlikely to
  134. come up with the best design first time. So you will often make a series of
  135. versions of a particular area of code, each taking a different approach to the
  136. problem, until you come to a satisfactory solution. It is important not to spend
  137. too much time on the details at this stage until you have finalized the overall
  138. design, otherwise much of your work will be thrown out.
  139. It is difficult to give general guidelines for performant design because this is
  140. so dependent on the problem. One point worth mentioning though, on the CPU
  141. side, is that modern CPUs are nearly always limited by memory bandwidth. This
  142. has led to a resurgence in data orientated design, which involves designing data
  143. structures and algorithms for locality of data and linear access, rather than
  144. jumping around in memory.
  145. The optimization process
  146. ~~~~~~~~~~~~~~~~~~~~~~~~
  147. Assuming we have a reasonable design, and taking our lessons from Knuth, our
  148. first step in optimization should be to identify the biggest bottlenecks - the
  149. slowest functions, the low hanging fruit.
  150. Once we have successfully improved the speed of the slowest area, it may no
  151. longer be the bottleneck. So we should test / profile again, and find the next
  152. bottleneck on which to focus.
  153. The process is thus:
  154. 1. Profile / Identify bottleneck
  155. 2. Optimize bottleneck
  156. 3. Return to step 1
  157. Optimizing bottlenecks
  158. ~~~~~~~~~~~~~~~~~~~~~~
  159. Some profilers will even tell you which part of a function (which data accesses,
  160. calculations) are slowing things down.
  161. As with design you should concentrate your efforts first on making sure the
  162. algorithms and data structures are the best they can be. Data access should be
  163. local (to make best use of CPU cache), and it can often be better to use compact
  164. storage of data (again, always profile to test results). Often you precalculate
  165. heavy computations ahead of time (e.g. at level load, or loading precalculated
  166. data files).
  167. Once algorithms and data are good, you can often make small changes in routines
  168. which improve performance, things like moving calculations outside of loops.
  169. Always retest your timing / bottlenecks after making each change. Some changes
  170. will increase speed, others may have a negative effect. Sometimes a small
  171. positive effect will be outweighed by the negatives of more complex code, and
  172. you may choose to leave out that optimization.
  173. Appendix
  174. ========
  175. Bottleneck math
  176. ~~~~~~~~~~~~~~~
  177. The proverb "a chain is only as strong as its weakest link" applies directly to
  178. performance optimization. If your project is spending 90% of the time in
  179. function 'A', then optimizing A can have a massive effect on performance.
  180. .. code-block:: none
  181. A: 9 ms
  182. Everything else: 1 ms
  183. Total frame time: 10 ms
  184. .. code-block:: none
  185. A: 1 ms
  186. Everything else: 1ms
  187. Total frame time: 2 ms
  188. So in this example improving this bottleneck A by a factor of 9x, decreases
  189. overall frame time by 5x, and increases frames per second by 5x.
  190. If however, something else is running slowly and also bottlenecking your
  191. project, then the same improvement can lead to less dramatic gains:
  192. .. code-block:: none
  193. A: 9 ms
  194. Everything else: 50 ms
  195. Total frame time: 59 ms
  196. .. code-block:: none
  197. A: 1 ms
  198. Everything else: 50 ms
  199. Total frame time: 51 ms
  200. So in this example, even though we have hugely optimized functionality A, the
  201. actual gain in terms of frame rate is quite small.
  202. In games, things become even more complicated because the CPU and GPU run
  203. independently of one another. Your total frame time is determined by the slower
  204. of the two.
  205. .. code-block:: none
  206. CPU: 9 ms
  207. GPU: 50 ms
  208. Total frame time: 50 ms
  209. .. code-block:: none
  210. CPU: 1 ms
  211. GPU: 50 ms
  212. Total frame time: 50 ms
  213. In this example, we optimized the CPU hugely again, but the frame time did not
  214. improve, because we are GPU-bottlenecked.