batching.rst 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549
  1. .. _doc_batching:
  2. Optimization using batching
  3. ===========================
  4. Introduction
  5. ~~~~~~~~~~~~
  6. Game engines have to send a set of instructions to the GPU in order to tell the
  7. GPU what and where to draw. These instructions are sent using common
  8. instructions, called APIs (Application Programming Interfaces), examples of
  9. which are OpenGL, OpenGL ES, and Vulkan.
  10. Different APIs incur different costs when drawing objects. OpenGL handles a lot
  11. of work for the user in the GPU driver at the cost of more expensive draw calls.
  12. As a result, applications can often be sped up by reducing the number of draw
  13. calls.
  14. Draw calls
  15. ^^^^^^^^^^
  16. In 2D, we need to tell the GPU to render a series of primitives (rectangles,
  17. lines, polygons etc). The most obvious technique is to tell the GPU to render
  18. one primitive at a time, telling it some information such as the texture used,
  19. the material, the position, size, etc. then saying "Draw!" (this is called a
  20. draw call).
  21. It turns out that while this is conceptually simple from the engine side, GPUs
  22. operate very slowly when used in this manner. GPUs work much more efficiently
  23. if, instead of telling them to draw a single primitive, you tell them to draw a
  24. number of similar primitives all in one draw call, which we will call a "batch".
  25. And it turns out that they don't just work a bit faster when used in this
  26. manner, they work a *lot* faster.
  27. As Godot is designed to be a general purpose engine, the primitives coming into
  28. the Godot renderer can be in any order, sometimes similar, and sometimes
  29. dissimilar. In order to match the general purpose nature of Godot with the
  30. batching preferences of GPUs, Godot features an intermediate layer which can
  31. automatically group together primitives wherever possible, and send these
  32. batches on to the GPU. This can give an increase in rendering performance while
  33. requiring few, if any, changes to your Godot project.
  34. How it works
  35. ~~~~~~~~~~~~
  36. Instructions come into the renderer from your game in the form of a series of
  37. items, each of which can contain one or more commands. The items correspond to
  38. Nodes in the scene tree, and the commands correspond to primitives such as
  39. rectangles or polygons. Some items, such as tilemaps, and text, can contain a
  40. large number of commands (tiles and letters respectively). Others, such as
  41. sprites, may only contain a single command (rectangle).
  42. The batcher uses two main techniques to group together primitives:
  43. * Consecutive items can be joined together
  44. * Consecutive commands within an item can be joined to form a batch
  45. Breaking batching
  46. ^^^^^^^^^^^^^^^^^
  47. Batching can only take place if the items or commands are similar enough to be
  48. rendered in one draw call. Certain changes (or techniques), by necessity, prevent
  49. the formation of a contiguous batch, this is referred to as 'breaking batching'.
  50. Batching will be broken by (amongst other things):
  51. * Change of texture
  52. * Change of material
  53. * Change of primitive type (say going from rectangles to lines)
  54. .. note::
  55. If for example, you draw a series of sprites each with a different texture,
  56. there is no way they can be batched.
  57. Render order
  58. ^^^^^^^^^^^^
  59. The question arises, if only similar items can be drawn together in a batch, why
  60. don't we look through all the items in a scene, group together all the similar
  61. items, and draw them together?
  62. In 3D, this is often exactly how engines work. However, in Godot 2D, items are
  63. drawn in 'painter's order', from back to front. This ensures that items at the
  64. front are drawn on top of earlier items, when they overlap.
  65. This also means that if we try and draw objects in order of, for example,
  66. texture, then this painter's order may break and objects will be drawn in the
  67. wrong order.
  68. In Godot this back to front order is determined by:
  69. * The order of objects in the scene tree
  70. * The Z index of objects
  71. * The canvas layer
  72. * Y sort nodes
  73. .. note::
  74. You can group similar objects together for easier batching. While doing so
  75. is not a requirement on your part, think of it as an optional approach that
  76. can improve performance in some cases. See the diagnostics section in order
  77. to help you make this decision.
  78. A trick
  79. ^^^^^^^
  80. And now a sleight of hand. Although the idea of painter's order is that objects
  81. are rendered from back to front, consider 3 objects A, B and C, that contain 2
  82. different textures, grass and wood.
  83. .. image:: img/overlap1.png
  84. In painter's order they are ordered:
  85. ::
  86. A - wood
  87. B - grass
  88. C - wood
  89. Because the texture changes, they cannot be batched, and will be rendered in 3
  90. draw calls.
  91. However, painter's order is only needed on the assumption that they will be
  92. drawn *on top* of each other. If we relax that assumption, i.e. if none of these
  93. 3 objects are overlapping, there is *no need* to preserve painter's order. The
  94. rendered result will be the same. What if we could take advantage of this?
  95. Item reordering
  96. ^^^^^^^^^^^^^^^
  97. .. image:: img/overlap2.png
  98. It turns out that we can reorder items. However, we can only do this if the
  99. items satisfy the conditions of an overlap test, to ensure that the end result
  100. will be the same as if they were not reordered. The overlap test is very cheap
  101. in performance terms, but not absolutely free, so there is a slight cost to
  102. looking ahead to decide whether items can be reordered. The number of items to
  103. lookahead for reordering can be set in project settings (see below), in order to
  104. balance the costs and benefits in your project.
  105. ::
  106. A - wood
  107. C - wood
  108. B - grass
  109. Because the texture only changes once, we can render the above in only 2
  110. draw calls.
  111. Lights
  112. ~~~~~~
  113. Although the job for the batching system is normally quite straightforward, it
  114. becomes considerably more complex when 2D lights are used, because lights are
  115. drawn using extra passes, one for each light affecting the primitive. Consider 2
  116. sprites A and B, with identical texture and material. Without lights they would
  117. be batched together and drawn in one draw call. But with 3 lights, they would be
  118. drawn as follows, each line a draw call:
  119. .. image:: img/lights_overlap.png
  120. ::
  121. A
  122. A - light 1
  123. A - light 2
  124. A - light 3
  125. B
  126. B - light 1
  127. B - light 2
  128. B - light 3
  129. That is a lot of draw calls, 8 for only 2 sprites. Now consider we are drawing
  130. 1000 sprites, the number of draw calls quickly becomes astronomical, and
  131. performance suffers. This is partly why lights have the potential to drastically
  132. slow down 2D.
  133. However, if you remember our magician's trick from item reordering, it turns out
  134. we can use the same trick to get around painter's order for lights!
  135. If A and B are not overlapping, we can render them together in a batch, so the
  136. draw process is as follows:
  137. .. image:: img/lights_separate.png
  138. ::
  139. AB
  140. AB - light 1
  141. AB - light 2
  142. AB - light 3
  143. That is 4 draw calls. Not bad, that is a 50% improvement. However consider that
  144. in a real game, you might be drawing closer to 1000 sprites.
  145. - Before: 1000 * 4 = 4000 draw calls.
  146. - After: 1 * 4 = 4 draw calls.
  147. That is 1000x decrease in draw calls, and should give a huge increase in
  148. performance.
  149. Overlap test
  150. ^^^^^^^^^^^^
  151. However, as with the item reordering, things are not that simple, we must first
  152. perform the overlap test to determine whether we can join these primitives, and
  153. the overlap test has a small cost. So again you can choose the number of
  154. primitives to lookahead in the overlap test to balance the benefits against the
  155. cost. Usually with lights the benefits far outweigh the costs.
  156. Also consider that depending on the arrangement of primitives in the viewport,
  157. the overlap test will sometimes fail (because the primitives overlap and thus
  158. should not be joined). So in practice the decrease in draw calls may be less
  159. dramatic than the perfect situation of no overlap. However performance is
  160. usually far higher than without this lighting optimization.
  161. Light Scissoring
  162. ~~~~~~~~~~~~~~~~
  163. Batching can make it more difficult to cull out objects that are not affected or
  164. partially affected by a light. This can increase the fill rate requirements
  165. quite a bit, and slow rendering. Fill rate is the rate at which pixels are
  166. colored, it is another potential bottleneck unrelated to draw calls.
  167. In order to counter this problem, (and also speedup lighting in general),
  168. batching introduces light scissoring. This enables the use of the OpenGL command
  169. ``glScissor()``, which identifies an area, outside of which, the GPU will not
  170. render any pixels. We can thus greatly optimize fill rate by identifying the
  171. intersection area between a light and a primitive, and limit rendering the light
  172. to *that area only*.
  173. Light scissoring is controlled with the :ref:`scissor_area_threshold
  174. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  175. project setting. This value is between 1.0 and 0.0, with 1.0 being off (no
  176. scissoring), and 0.0 being scissoring in every circumstance. The reason for the
  177. setting is that there may be some small cost to scissoring on some hardware.
  178. Generally though, when you are using lighting, it should result in some
  179. performance gains.
  180. The relationship between the threshold and whether a scissor operation takes
  181. place is not altogether straight forward, but generally it represents the pixel
  182. area that is potentially 'saved' by a scissor operation (i.e. the fill rate
  183. saved). At 1.0, the entire screens pixels would need to be saved, which rarely
  184. if ever happens, so it is switched off. In practice the useful values are
  185. bunched towards zero, as only a small percentage of pixels need to be saved for
  186. the operation to be useful.
  187. The exact relationship is probably not necessary for users to worry about, but
  188. out of interest is included in the appendix.
  189. .. image:: img/scissoring.png
  190. *Bottom right is a light, the red area is the pixels saved by the scissoring
  191. operation. Only the intersection needs to be rendered.*
  192. Vertex baking
  193. ~~~~~~~~~~~~~
  194. The GPU shader receives instructions on what to draw in 2 main ways:
  195. * Shader uniforms (e.g. modulate color, item transform)
  196. * Vertex attributes (vertex color, local transform)
  197. However, within a single draw call (batch) we cannot change uniforms. This means
  198. that naively, we would not be able to batch together items or commands that
  199. change final_modulate, or item transform. Unfortunately that is an awful lot of
  200. cases. Sprites for instance typically are individual nodes with their own item
  201. transform, and they may have their own color modulate.
  202. To get around this problem, the batching can "bake" some of the uniforms into
  203. the vertex attributes.
  204. * The item transform can be combined with the local transform and sent in a
  205. vertex attribute.
  206. * The final modulate color can be combined with the vertex colors, and sent in a
  207. vertex attribute.
  208. In most cases this works fine, but this shortcut breaks down if a shader expects
  209. these values to be available individually, rather than combined. This can happen
  210. in custom shaders.
  211. Custom Shaders
  212. ^^^^^^^^^^^^^^
  213. As a result certain operations in custom shaders will prevent baking, and thus
  214. decrease the potential for batching. While we are working to decrease these
  215. cases, currently the following conditions apply:
  216. * Reading or writing ``COLOR`` or ``MODULATE`` - disables vertex color baking
  217. * Reading ``VERTEX`` - disables vertex position baking
  218. Project Settings
  219. ~~~~~~~~~~~~~~~~
  220. In order to fine tune batching, a number of project settings are available. You
  221. can usually leave these at default during development, but it is a good idea to
  222. experiment to ensure you are getting maximum performance. Spending a little time
  223. tweaking parameters can often give considerable performance gain, for very
  224. little effort. See the tooltips in the project settings for more info.
  225. rendering/batching/options
  226. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  227. * :ref:`use_batching
  228. <class_ProjectSettings_property_rendering/batching/options/use_batching>` -
  229. Turns batching on and off
  230. * :ref:`use_batching_in_editor
  231. <class_ProjectSettings_property_rendering/batching/options/use_batching_in_editor>`
  232. * :ref:`single_rect_fallback
  233. <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>`
  234. - This is a faster way of drawing unbatchable rectangles, however it may lead
  235. to flicker on some hardware so is not recommended
  236. rendering/batching/parameters
  237. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  238. * :ref:`max_join_item_commands <class_ProjectSettings_property_rendering/batching/parameters/max_join_item_commands>` -
  239. One of the most important ways of achieving
  240. batching is to join suitable adjacent items (nodes) together, however they can
  241. only be joined if the commands they contain are compatible. The system must
  242. therefore do a lookahead through the commands in an item to determine whether
  243. it can be joined. This has a small cost per command, and items with a large
  244. number of commands are not worth joining, so the best value may be project
  245. dependent.
  246. * :ref:`colored_vertex_format_threshold
  247. <class_ProjectSettings_property_rendering/batching/parameters/colored_vertex_format_threshold>` - Baking colors into
  248. vertices results in a
  249. larger vertex format. This is not necessarily worth doing unless there are a
  250. lot of color changes going on within a joined item. This parameter represents
  251. the proportion of commands containing color changes / the total commands,
  252. above which it switches to baked colors.
  253. * :ref:`batch_buffer_size
  254. <class_ProjectSettings_property_rendering/batching/parameters/batch_buffer_size>`
  255. - This determines the maximum size of a batch, it doesn't have a huge effect
  256. on performance but can be worth decreasing for mobile if RAM is at a premium.
  257. * :ref:`item_reordering_lookahead
  258. <class_ProjectSettings_property_rendering/batching/parameters/item_reordering_lookahead>`
  259. - Item reordering can help especially with
  260. interleaved sprites using different textures. The lookahead for the overlap
  261. test has a small cost, so the best value may change per project.
  262. rendering/batching/lights
  263. ^^^^^^^^^^^^^^^^^^^^^^^^^
  264. * :ref:`scissor_area_threshold
  265. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  266. - See light scissoring.
  267. * :ref:`max_join_items
  268. <class_ProjectSettings_property_rendering/batching/lights/max_join_items>` -
  269. Joining items before lighting can significantly increase
  270. performance. This requires an overlap test, which has a small cost, so the
  271. costs and benefits may be project dependent, and hence the best value to use
  272. here.
  273. rendering/batching/debug
  274. ^^^^^^^^^^^^^^^^^^^^^^^^
  275. * :ref:`flash_batching
  276. <class_ProjectSettings_property_rendering/batching/debug/flash_batching>` -
  277. This is purely a debugging feature to identify regressions between the
  278. batching and legacy renderer. When it is switched on, the batching and legacy
  279. renderer are used alternately on each frame. This will decrease performance,
  280. and should not be used for your final export, only for testing.
  281. * :ref:`diagnose_frame
  282. <class_ProjectSettings_property_rendering/batching/debug/diagnose_frame>` -
  283. This will periodically print a diagnostic batching log to
  284. the Godot IDE / console.
  285. rendering/batching/precision
  286. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  287. * :ref:`uv_contract
  288. <class_ProjectSettings_property_rendering/batching/precision/uv_contract>` -
  289. On some hardware (notably some Android devices) there have been reports of
  290. tilemap tiles drawing slightly outside their UV range, leading to edge
  291. artifacts such as lines around tiles. If you see this problem, try enabling uv
  292. contract. This makes a small contraction in the UV coordinates to compensate
  293. for precision errors on devices.
  294. * :ref:`uv_contract_amount
  295. <class_ProjectSettings_property_rendering/batching/precision/uv_contract_amount>`
  296. - Hopefully the default amount should cure artifacts on most devices, but just
  297. in case, this value is editable.
  298. Diagnostics
  299. ~~~~~~~~~~~
  300. Although you can change parameters and examine the effect on frame rate, this
  301. can feel like working blindly, with no idea of what is going on under the hood.
  302. To help with this, batching offers a diagnostic mode, which will periodically
  303. print out (to the IDE or console) a list of the batches that are being
  304. processed. This can help pin point situations where batching is not occurring as
  305. intended, and help you to fix them, in order to get the best possible
  306. performance.
  307. Reading a diagnostic
  308. ^^^^^^^^^^^^^^^^^^^^
  309. .. code-block:: cpp
  310. canvas_begin FRAME 2604
  311. items
  312. joined_item 1 refs
  313. batch D 0-0
  314. batch D 0-2 n n
  315. batch R 0-1 [0 - 0] {255 255 255 255 }
  316. joined_item 1 refs
  317. batch D 0-0
  318. batch R 0-1 [0 - 146] {255 255 255 255 }
  319. batch D 0-0
  320. batch R 0-1 [0 - 146] {255 255 255 255 }
  321. joined_item 1 refs
  322. batch D 0-0
  323. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  324. batch D 0-0
  325. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  326. batch D 0-0
  327. batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
  328. canvas_end
  329. This is a typical diagnostic.
  330. * **joined_item** - A joined item can contain 1 or
  331. more references to items (nodes). Generally joined_items containing many
  332. references is preferable to many joined_items containing a single reference.
  333. Whether items can be joined will be determined by their contents and
  334. compatibility with the previous item.
  335. * **batch R** - a batch containing rectangles. The second number is the number of
  336. rects. The second number in square brackets is the Godot texture ID, and the
  337. numbers in curly braces is the color. If the batch contains more than one rect,
  338. MULTI is added to the line to make it easy to identify. Seeing MULTI is good,
  339. because this indicates successful batching.
  340. * **batch D** - a default batch, containing everything else that is not currently
  341. batched.
  342. Default Batches
  343. ^^^^^^^^^^^^^^^
  344. The second number following default batches is the number of commands in the
  345. batch, and it is followed by a brief summary of the contents:
  346. ::
  347. l - line
  348. PL - polyline
  349. r - rect
  350. n - ninepatch
  351. PR - primitive
  352. p - polygon
  353. m - mesh
  354. MM - multimesh
  355. PA - particles
  356. c - circle
  357. t - transform
  358. CI - clip_ignore
  359. You may see "dummy" default batches containing no commands, you can ignore
  360. these.
  361. FAQ
  362. ~~~
  363. I don't get a large performance increase from switching on batching
  364. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  365. * Try the diagnostics, see how much batching is occurring, and whether it can be
  366. improved
  367. * Try changing parameters
  368. * Consider that batching may not be your bottleneck (see bottlenecks)
  369. I get a decrease in performance with batching
  370. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  371. * Try steps to increase batching given above
  372. * Try switching :ref:`single_rect_fallback
  373. <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>`
  374. to on
  375. * The single rect fallback method is the default used without batching, and it
  376. is approximately twice as fast, however it can result in flicker on some
  377. hardware, so its use is discouraged
  378. * After trying the above, if your scene is still performing worse, consider
  379. turning off batching.
  380. I use custom shaders and the items are not batching
  381. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  382. * Custom shaders can be problematic for batching, see the custom shaders section
  383. I am seeing line artifacts appear on certain hardware
  384. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  385. * See the :ref:`uv_contract
  386. <class_ProjectSettings_property_rendering/batching/precision/uv_contract>`
  387. project setting which can be used to solve this problem.
  388. I use a large number of textures, so few items are being batched
  389. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  390. * Consider the use of texture atlases. As well as allowing batching, these
  391. reduce the need for state changes associated with changing texture.
  392. Appendix
  393. ~~~~~~~~
  394. Light scissoring threshold calculation
  395. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  396. The actual proportion of screen pixel area used as the threshold is the
  397. :ref:`scissor_area_threshold
  398. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  399. value to the power of 4.
  400. For example, on a screen size ``1920 x 1080`` there are ``2,073,600`` pixels.
  401. At a threshold of ``1000`` pixels, the proportion would be:
  402. ::
  403. 1000 / 2073600 = 0.00048225
  404. 0.00048225 ^ 0.25 = 0.14819
  405. .. note:: The power of 0.25 is the opposite of power of 4).
  406. So a :ref:`scissor_area_threshold
  407. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  408. of 0.15 would be a reasonable value to try.
  409. Going the other way, for instance with a :ref:`scissor_area_threshold
  410. <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
  411. of ``0.5``:
  412. ::
  413. 0.5 ^ 4 = 0.0625
  414. 0.0625 * 2073600 = 129600 pixels
  415. If the number of pixels saved is more than this threshold, the scissor is
  416. activated.