cpu_optimization.rst 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258
  1. .. _doc_cpu_optimization:
  2. CPU Optimizations
  3. =================
  4. Measuring performance
  5. =====================
  6. To know how to speed up our program, we have to know where the "bottlenecks"
  7. are. Bottlenecks are the slowest parts of the program that limit the rate that
  8. everything can progress. This allows us to concentrate our efforts on optimizing
  9. the areas which will give us the greatest speed improvement, instead of spending
  10. a lot of time optimizing functions that will lead to small performance
  11. improvements.
  12. For the CPU, the easiest way to identify bottlenecks is to use a profiler.
  13. CPU profilers
  14. =============
  15. Profilers run alongside your program and take timing measurements to work out
  16. what proportion of time is spent in each function.
  17. The Godot IDE conveniently has a built in profiler. It does not run every time
  18. you start your project, and must be manually started and stopped. This is
  19. because, in common with most profilers, recording these timing measurements can
  20. slow down your project significantly.
  21. After profiling, you can look back at the results for a frame.
  22. .. image:: img/godot_profiler.png
  23. `These are the results of a profile of one of the demo projects.`
  24. .. note:: We can see the cost of built-in processes such as physics and audio,
  25. as well as seeing the cost of our own scripting functions at the
  26. bottom.
  27. When a project is running slowly, you will often see an obvious function or
  28. process taking a lot more time than others. This is your primary bottleneck, and
  29. you can usually increase speed by optimizing this area.
  30. For more info about using the profiler within Godot see
  31. :ref:`doc_debugger_panel`.
  32. External profilers
  33. ~~~~~~~~~~~~~~~~~~
  34. Although the Godot IDE profiler is very convenient and useful, sometimes you
  35. need more power, and the ability to profile the Godot engine source code itself.
  36. You can use a number of third party profilers to do this including Valgrind,
  37. VerySleepy, Visual Studio and Intel VTune.
  38. .. note:: You may need to compile Godot from source in order to use a third
  39. party profiler so that you have program database information
  40. available. You can also use a debug build, however, note that the
  41. results of profiling a debug build will be different to a release
  42. build, because debug builds are less optimized. Bottlenecks are often
  43. in a different place in debug builds, so you should profile release
  44. builds wherever possible.
  45. .. image:: img/valgrind.png
  46. `These are example results from Callgrind, part of Valgrind, on Linux.`
  47. From the left, Callgrind is listing the percentage of time within a function and
  48. its children (Inclusive), the percentage of time spent within the function
  49. itself, excluding child functions (Self), the number of times the function is
  50. called, the function name, and the file or module.
  51. In this example we can see nearly all time is spent under the
  52. `Main::iteration()` function, this is the master function in the Godot source
  53. code that is called repeatedly, and causes frames to be drawn, physics ticks to
  54. be simulated, and nodes and scripts to be updated. A large proportion of the
  55. time is spent in the functions to render a canvas (66%), because this example
  56. uses a 2d benchmark. Below this we see that almost 50% of the time is spent
  57. outside Godot code in `libglapi`, and `i965_dri` (the graphics driver). This
  58. tells us the a large proportion of CPU time is being spent in the graphics
  59. driver.
  60. This is actually an excellent example because in an ideal world, only a very
  61. small proportion of time would be spent in the graphics driver, and this is an
  62. indication that there is a problem with too much communication and work being
  63. done in the graphics API. This profiling lead to the development of 2d batching,
  64. which greatly speeds up 2d by reducing bottlenecks in this area.
  65. Manually timing functions
  66. =========================
  67. Another handy technique, especially once you have identified the bottleneck
  68. using a profiler, is to manually time the function or area under test. The
  69. specifics vary according to language, but in GDScript, you would do the
  70. following:
  71. ::
  72. var time_start = OS.get_system_time_msecs()
  73. # Your function you want to time
  74. update_enemies()
  75. var time_end = OS.get_system_time_msecs()
  76. print("Function took: " + str(time_end - time_start))
  77. You may want to consider using other functions for time if another time unit is
  78. more suitable, for example :ref:`OS.get_system_time_secs
  79. <class_OS_method_get_system_time_secs>` if the function will take many seconds.
  80. When manually timing functions, it is usually a good idea to run the function
  81. many times (say ``1000`` or more times), instead of just once (unless it is a
  82. very slow function). A large part of the reason for this is that timers often
  83. have limited accuracy, and CPUs will schedule processes in a haphazard manner,
  84. so an average over a series of runs is more accurate than a single measurement.
  85. As you attempt to optimize functions, be sure to either repeatedly profile or
  86. time them as you go. This will give you crucial feedback as to whether the
  87. optimization is working (or not).
  88. Caches
  89. ======
  90. Something else to be particularly aware of, especially when comparing timing
  91. results of two different versions of a function, is that the results can be
  92. highly dependent on whether the data is in the CPU cache or not. CPUs don't load
  93. data directly from main memory, because although main memory can be huge (many
  94. GBs), it is very slow to access. Instead CPUs load data from a smaller, higher
  95. speed bank of memory, called cache. Loading data from cache is super fast, but
  96. every time you try and load a memory address that is not stored in cache, the
  97. cache must make a trip to main memory and slowly load in some data. This delay
  98. can result in the CPU sitting around idle for a long time, and is referred to as
  99. a "cache miss".
  100. This means that the first time you run a function, it may run slowly, because
  101. the data is not in cache. The second and later times, it may run much faster
  102. because the data is in cache. So always use averages when timing, and be aware
  103. of the effects of cache.
  104. Understanding caching is also crucial to CPU optimization. If you have an
  105. algorithm (routine) that loads small bits of data from randomly spread out areas
  106. of main memory, this can result in a lot of cache misses, a lot of the time, the
  107. CPU will be waiting around for data instead of doing any work. Instead, if you
  108. can make your data accesses localised, or even better, access memory in a linear
  109. fashion (like a continuous list), then the cache will work optimally and the CPU
  110. will be able to work as fast as possible.
  111. Godot usually takes care of such low-level details for you. For example, the
  112. Server APIs make sure data is optimized for caching already for things like
  113. rendering and physics. But you should be especially aware of caching when using
  114. GDNative.
  115. Languages
  116. =========
  117. Godot supports a number of different languages, and it is worth bearing in mind
  118. that there are trade-offs involved - some languages are designed for ease of
  119. use, at the cost of speed, and others are faster but more difficult to work
  120. with.
  121. Built-in engine functions run at the same speed regardless of the scripting
  122. language you choose. If your project is making a lot of calculations in its own
  123. code, consider moving those calculations to a faster language.
  124. GDScript
  125. ~~~~~~~~
  126. GDScript is designed to be easy to use and iterate, and is ideal for making many
  127. types of games. However, ease of use is considered more important than
  128. performance, so if you need to make heavy calculations, consider moving some of
  129. your project to one of the other languages.
  130. C#
  131. ~~
  132. C# is popular and has first class support in Godot. It offers a good compromise
  133. between speed and ease of use.
  134. Other languages
  135. ~~~~~~~~~~~~~~~
  136. Third parties provide support for several other languages, including `Rust
  137. <https://github.com/godot-rust/godot-rust>`_ and `Javascript
  138. <https://github.com/GodotExplorer/ECMAScript>`_.
  139. C++
  140. ~~~
  141. Godot is written in C++. Using C++ will usually result in the fastest code,
  142. however, on a practical level, it is the most difficult to deploy to end users'
  143. machines on different platforms. Options for using C++ include GDNative, and
  144. custom modules.
  145. Threads
  146. =======
  147. Consider using threads when making a lot of calculations that can run parallel
  148. to one another. Modern CPUs have multiple cores, each one capable of doing a
  149. limited amount of work. By spreading work over multiple threads you can move
  150. further towards peak CPU efficiency.
  151. The disadvantage of threads is that you have to be incredibly careful. As each
  152. CPU core operates independently, they can end up trying to access the same
  153. memory at the same time. One thread can be reading to a variable while another
  154. is writing. Before you use threads make sure you understand the dangers and how
  155. to try and prevent these race conditions.
  156. For more information on threads see :ref:`doc_using_multiple_threads`.
  157. SceneTree
  158. =========
  159. Although Nodes are an incredibly powerful and versatile concept, be aware that
  160. every node has a cost. Built in functions such as `_process()` and
  161. `_physics_process()` propagate through the tree. This housekeeping can reduce
  162. performance when you have very large numbers of nodes.
  163. Each node is handled individually in the Godot renderer so sometimes a smaller
  164. number of nodes with more in each can lead to better performance.
  165. One quirk of the :ref:`SceneTree <class_SceneTree>` is that you can sometimes
  166. get much better performance by removing nodes from the SceneTree, rather than
  167. by pausing or hiding them. You don't have to delete a detached node. You
  168. can for example, keep a reference to a node, detach it from the scene tree, then
  169. reattach it later. This can be very useful for adding and removing areas from a
  170. game for example.
  171. You can avoid the SceneTree altogether by using Server APIs. For more
  172. information, see :ref:`doc_using_servers`.
  173. Physics
  174. =======
  175. In some situations physics can end up becoming a bottleneck, particularly with
  176. complex worlds, and large numbers of physics objects.
  177. Some techniques to speed up physics:
  178. * Try using simplified versions of your rendered geometry for physics. Often
  179. this won't be noticeable for end users, but can greatly increase performance.
  180. * Try removing objects from physics when they are out of view / outside the
  181. current area, or reusing physics objects (maybe you allow 8 monsters per area,
  182. for example, and reuse these).
  183. Another crucial aspect to physics is the physics tick rate. In some games you
  184. can greatly reduce the tick rate, and instead of for example, updating physics
  185. 60 times per second, you may update it at 20, or even 10 ticks per second. This
  186. can greatly reduce the CPU load.
  187. The downside of changing physics tick rate is you can get jerky movement or
  188. jitter when the physics update rate does not match the frames rendered.
  189. The solution to this problem is 'fixed timestep interpolation', which involves
  190. smoothing the rendered positions and rotations over multiple frames to match the
  191. physics. You can either implement this yourself or use a third-party addon.
  192. Interpolation is a very cheap operation, performance wise, compared to running a
  193. physics tick, orders of magnitude faster, so this can be a significant win, as
  194. well as reducing jitter.