123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549 |
- .. _doc_batching:
- Optimization using batching
- ===========================
- Introduction
- ~~~~~~~~~~~~
- Game engines have to send a set of instructions to the GPU in order to tell the
- GPU what and where to draw. These instructions are sent using common
- instructions, called APIs (Application Programming Interfaces), examples of
- which are OpenGL, OpenGL ES, and Vulkan.
- Different APIs incur different costs when drawing objects. OpenGL handles a lot
- of work for the user in the GPU driver at the cost of more expensive draw calls.
- As a result, applications can often be sped up by reducing the number of draw
- calls.
- Draw calls
- ^^^^^^^^^^
- In 2D, we need to tell the GPU to render a series of primitives (rectangles,
- lines, polygons etc). The most obvious technique is to tell the GPU to render
- one primitive at a time, telling it some information such as the texture used,
- the material, the position, size, etc. then saying "Draw!" (this is called a
- draw call).
- It turns out that while this is conceptually simple from the engine side, GPUs
- operate very slowly when used in this manner. GPUs work much more efficiently
- if, instead of telling them to draw a single primitive, you tell them to draw a
- number of similar primitives all in one draw call, which we will call a "batch".
- And it turns out that they don't just work a bit faster when used in this
- manner, they work a *lot* faster.
- As Godot is designed to be a general purpose engine, the primitives coming into
- the Godot renderer can be in any order, sometimes similar, and sometimes
- dissimilar. In order to match the general purpose nature of Godot with the
- batching preferences of GPUs, Godot features an intermediate layer which can
- automatically group together primitives wherever possible, and send these
- batches on to the GPU. This can give an increase in rendering performance while
- requiring few, if any, changes to your Godot project.
- How it works
- ~~~~~~~~~~~~
- Instructions come into the renderer from your game in the form of a series of
- items, each of which can contain one or more commands. The items correspond to
- Nodes in the scene tree, and the commands correspond to primitives such as
- rectangles or polygons. Some items, such as tilemaps, and text, can contain a
- large number of commands (tiles and letters respectively). Others, such as
- sprites, may only contain a single command (rectangle).
- The batcher uses two main techniques to group together primitives:
- * Consecutive items can be joined together
- * Consecutive commands within an item can be joined to form a batch
- Breaking batching
- ^^^^^^^^^^^^^^^^^
- Batching can only take place if the items or commands are similar enough to be
- rendered in one draw call. Certain changes (or techniques), by necessity, prevent
- the formation of a contiguous batch, this is referred to as 'breaking batching'.
- Batching will be broken by (amongst other things):
- * Change of texture
- * Change of material
- * Change of primitive type (say going from rectangles to lines)
- .. note::
-
- If for example, you draw a series of sprites each with a different texture,
- there is no way they can be batched.
- Render order
- ^^^^^^^^^^^^
- The question arises, if only similar items can be drawn together in a batch, why
- don't we look through all the items in a scene, group together all the similar
- items, and draw them together?
- In 3D, this is often exactly how engines work. However, in Godot 2D, items are
- drawn in 'painter's order', from back to front. This ensures that items at the
- front are drawn on top of earlier items, when they overlap.
- This also means that if we try and draw objects in order of, for example,
- texture, then this painter's order may break and objects will be drawn in the
- wrong order.
- In Godot this back to front order is determined by:
- * The order of objects in the scene tree
- * The Z index of objects
- * The canvas layer
- * Y sort nodes
- .. note::
-
- You can group similar objects together for easier batching. While doing so
- is not a requirement on your part, think of it as an optional approach that
- can improve performance in some cases. See the diagnostics section in order
- to help you make this decision.
- A trick
- ^^^^^^^
- And now a sleight of hand. Although the idea of painter's order is that objects
- are rendered from back to front, consider 3 objects A, B and C, that contain 2
- different textures, grass and wood.
- .. image:: img/overlap1.png
- In painter's order they are ordered:
- ::
- A - wood
- B - grass
- C - wood
- Because the texture changes, they cannot be batched, and will be rendered in 3
- draw calls.
- However, painter's order is only needed on the assumption that they will be
- drawn *on top* of each other. If we relax that assumption, i.e. if none of these
- 3 objects are overlapping, there is *no need* to preserve painter's order. The
- rendered result will be the same. What if we could take advantage of this?
- Item reordering
- ^^^^^^^^^^^^^^^
- .. image:: img/overlap2.png
- It turns out that we can reorder items. However, we can only do this if the
- items satisfy the conditions of an overlap test, to ensure that the end result
- will be the same as if they were not reordered. The overlap test is very cheap
- in performance terms, but not absolutely free, so there is a slight cost to
- looking ahead to decide whether items can be reordered. The number of items to
- lookahead for reordering can be set in project settings (see below), in order to
- balance the costs and benefits in your project.
- ::
- A - wood
- C - wood
- B - grass
-
- Because the texture only changes once, we can render the above in only 2
- draw calls.
- Lights
- ~~~~~~
- Although the job for the batching system is normally quite straightforward, it
- becomes considerably more complex when 2D lights are used, because lights are
- drawn using extra passes, one for each light affecting the primitive. Consider 2
- sprites A and B, with identical texture and material. Without lights they would
- be batched together and drawn in one draw call. But with 3 lights, they would be
- drawn as follows, each line a draw call:
- .. image:: img/lights_overlap.png
- ::
- A
- A - light 1
- A - light 2
- A - light 3
- B
- B - light 1
- B - light 2
- B - light 3
- That is a lot of draw calls, 8 for only 2 sprites. Now consider we are drawing
- 1000 sprites, the number of draw calls quickly becomes astronomical, and
- performance suffers. This is partly why lights have the potential to drastically
- slow down 2D.
- However, if you remember our magician's trick from item reordering, it turns out
- we can use the same trick to get around painter's order for lights!
- If A and B are not overlapping, we can render them together in a batch, so the
- draw process is as follows:
- .. image:: img/lights_separate.png
- ::
- AB
- AB - light 1
- AB - light 2
- AB - light 3
- That is 4 draw calls. Not bad, that is a 50% improvement. However consider that
- in a real game, you might be drawing closer to 1000 sprites.
- - Before: 1000 * 4 = 4000 draw calls.
- - After: 1 * 4 = 4 draw calls.
- That is 1000x decrease in draw calls, and should give a huge increase in
- performance.
- Overlap test
- ^^^^^^^^^^^^
- However, as with the item reordering, things are not that simple, we must first
- perform the overlap test to determine whether we can join these primitives, and
- the overlap test has a small cost. So again you can choose the number of
- primitives to lookahead in the overlap test to balance the benefits against the
- cost. Usually with lights the benefits far outweigh the costs.
- Also consider that depending on the arrangement of primitives in the viewport,
- the overlap test will sometimes fail (because the primitives overlap and thus
- should not be joined). So in practice the decrease in draw calls may be less
- dramatic than the perfect situation of no overlap. However performance is
- usually far higher than without this lighting optimization.
- Light Scissoring
- ~~~~~~~~~~~~~~~~
- Batching can make it more difficult to cull out objects that are not affected or
- partially affected by a light. This can increase the fill rate requirements
- quite a bit, and slow rendering. Fill rate is the rate at which pixels are
- colored, it is another potential bottleneck unrelated to draw calls.
- In order to counter this problem, (and also speedup lighting in general),
- batching introduces light scissoring. This enables the use of the OpenGL command
- ``glScissor()``, which identifies an area, outside of which, the GPU will not
- render any pixels. We can thus greatly optimize fill rate by identifying the
- intersection area between a light and a primitive, and limit rendering the light
- to *that area only*.
- Light scissoring is controlled with the :ref:`scissor_area_threshold
- <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
- project setting. This value is between 1.0 and 0.0, with 1.0 being off (no
- scissoring), and 0.0 being scissoring in every circumstance. The reason for the
- setting is that there may be some small cost to scissoring on some hardware.
- Generally though, when you are using lighting, it should result in some
- performance gains.
- The relationship between the threshold and whether a scissor operation takes
- place is not altogether straight forward, but generally it represents the pixel
- area that is potentially 'saved' by a scissor operation (i.e. the fill rate
- saved). At 1.0, the entire screens pixels would need to be saved, which rarely
- if ever happens, so it is switched off. In practice the useful values are
- bunched towards zero, as only a small percentage of pixels need to be saved for
- the operation to be useful.
- The exact relationship is probably not necessary for users to worry about, but
- out of interest is included in the appendix.
- .. image:: img/scissoring.png
- *Bottom right is a light, the red area is the pixels saved by the scissoring
- operation. Only the intersection needs to be rendered.*
- Vertex baking
- ~~~~~~~~~~~~~
- The GPU shader receives instructions on what to draw in 2 main ways:
- * Shader uniforms (e.g. modulate color, item transform)
- * Vertex attributes (vertex color, local transform)
- However, within a single draw call (batch) we cannot change uniforms. This means
- that naively, we would not be able to batch together items or commands that
- change final_modulate, or item transform. Unfortunately that is an awful lot of
- cases. Sprites for instance typically are individual nodes with their own item
- transform, and they may have their own color modulate.
- To get around this problem, the batching can "bake" some of the uniforms into
- the vertex attributes.
- * The item transform can be combined with the local transform and sent in a
- vertex attribute.
- * The final modulate color can be combined with the vertex colors, and sent in a
- vertex attribute.
- In most cases this works fine, but this shortcut breaks down if a shader expects
- these values to be available individually, rather than combined. This can happen
- in custom shaders.
- Custom Shaders
- ^^^^^^^^^^^^^^
- As a result certain operations in custom shaders will prevent baking, and thus
- decrease the potential for batching. While we are working to decrease these
- cases, currently the following conditions apply:
- * Reading or writing ``COLOR`` or ``MODULATE`` - disables vertex color baking
- * Reading ``VERTEX`` - disables vertex position baking
- Project Settings
- ~~~~~~~~~~~~~~~~
- In order to fine tune batching, a number of project settings are available. You
- can usually leave these at default during development, but it is a good idea to
- experiment to ensure you are getting maximum performance. Spending a little time
- tweaking parameters can often give considerable performance gain, for very
- little effort. See the tooltips in the project settings for more info.
- rendering/batching/options
- ^^^^^^^^^^^^^^^^^^^^^^^^^^
- * :ref:`use_batching
- <class_ProjectSettings_property_rendering/batching/options/use_batching>` -
- Turns batching on and off
- * :ref:`use_batching_in_editor
- <class_ProjectSettings_property_rendering/batching/options/use_batching_in_editor>`
- * :ref:`single_rect_fallback
- <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>`
- - This is a faster way of drawing unbatchable rectangles, however it may lead
- to flicker on some hardware so is not recommended
- rendering/batching/parameters
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * :ref:`max_join_item_commands <class_ProjectSettings_property_rendering/batching/parameters/max_join_item_commands>` -
- One of the most important ways of achieving
- batching is to join suitable adjacent items (nodes) together, however they can
- only be joined if the commands they contain are compatible. The system must
- therefore do a lookahead through the commands in an item to determine whether
- it can be joined. This has a small cost per command, and items with a large
- number of commands are not worth joining, so the best value may be project
- dependent.
- * :ref:`colored_vertex_format_threshold
- <class_ProjectSettings_property_rendering/batching/parameters/colored_vertex_format_threshold>` - Baking colors into
- vertices results in a
- larger vertex format. This is not necessarily worth doing unless there are a
- lot of color changes going on within a joined item. This parameter represents
- the proportion of commands containing color changes / the total commands,
- above which it switches to baked colors.
- * :ref:`batch_buffer_size
- <class_ProjectSettings_property_rendering/batching/parameters/batch_buffer_size>`
- - This determines the maximum size of a batch, it doesn't have a huge effect
- on performance but can be worth decreasing for mobile if RAM is at a premium.
- * :ref:`item_reordering_lookahead
- <class_ProjectSettings_property_rendering/batching/parameters/item_reordering_lookahead>`
- - Item reordering can help especially with
- interleaved sprites using different textures. The lookahead for the overlap
- test has a small cost, so the best value may change per project.
- rendering/batching/lights
- ^^^^^^^^^^^^^^^^^^^^^^^^^
- * :ref:`scissor_area_threshold
- <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
- - See light scissoring.
- * :ref:`max_join_items
- <class_ProjectSettings_property_rendering/batching/lights/max_join_items>` -
- Joining items before lighting can significantly increase
- performance. This requires an overlap test, which has a small cost, so the
- costs and benefits may be project dependent, and hence the best value to use
- here.
- rendering/batching/debug
- ^^^^^^^^^^^^^^^^^^^^^^^^
- * :ref:`flash_batching
- <class_ProjectSettings_property_rendering/batching/debug/flash_batching>` -
- This is purely a debugging feature to identify regressions between the
- batching and legacy renderer. When it is switched on, the batching and legacy
- renderer are used alternately on each frame. This will decrease performance,
- and should not be used for your final export, only for testing.
- * :ref:`diagnose_frame
- <class_ProjectSettings_property_rendering/batching/debug/diagnose_frame>` -
- This will periodically print a diagnostic batching log to
- the Godot IDE / console.
- rendering/batching/precision
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * :ref:`uv_contract
- <class_ProjectSettings_property_rendering/batching/precision/uv_contract>` -
- On some hardware (notably some Android devices) there have been reports of
- tilemap tiles drawing slightly outside their UV range, leading to edge
- artifacts such as lines around tiles. If you see this problem, try enabling uv
- contract. This makes a small contraction in the UV coordinates to compensate
- for precision errors on devices.
- * :ref:`uv_contract_amount
- <class_ProjectSettings_property_rendering/batching/precision/uv_contract_amount>`
- - Hopefully the default amount should cure artifacts on most devices, but just
- in case, this value is editable.
- Diagnostics
- ~~~~~~~~~~~
- Although you can change parameters and examine the effect on frame rate, this
- can feel like working blindly, with no idea of what is going on under the hood.
- To help with this, batching offers a diagnostic mode, which will periodically
- print out (to the IDE or console) a list of the batches that are being
- processed. This can help pin point situations where batching is not occurring as
- intended, and help you to fix them, in order to get the best possible
- performance.
- Reading a diagnostic
- ^^^^^^^^^^^^^^^^^^^^
- .. code-block:: cpp
- canvas_begin FRAME 2604
- items
- joined_item 1 refs
- batch D 0-0
- batch D 0-2 n n
- batch R 0-1 [0 - 0] {255 255 255 255 }
- joined_item 1 refs
- batch D 0-0
- batch R 0-1 [0 - 146] {255 255 255 255 }
- batch D 0-0
- batch R 0-1 [0 - 146] {255 255 255 255 }
- joined_item 1 refs
- batch D 0-0
- batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
- batch D 0-0
- batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
- batch D 0-0
- batch R 0-2560 [0 - 144] {158 193 0 104 } MULTI
- canvas_end
- This is a typical diagnostic.
- * **joined_item** - A joined item can contain 1 or
- more references to items (nodes). Generally joined_items containing many
- references is preferable to many joined_items containing a single reference.
- Whether items can be joined will be determined by their contents and
- compatibility with the previous item.
- * **batch R** - a batch containing rectangles. The second number is the number of
- rects. The second number in square brackets is the Godot texture ID, and the
- numbers in curly braces is the color. If the batch contains more than one rect,
- MULTI is added to the line to make it easy to identify. Seeing MULTI is good,
- because this indicates successful batching.
- * **batch D** - a default batch, containing everything else that is not currently
- batched.
- Default Batches
- ^^^^^^^^^^^^^^^
- The second number following default batches is the number of commands in the
- batch, and it is followed by a brief summary of the contents:
- ::
- l - line
- PL - polyline
- r - rect
- n - ninepatch
- PR - primitive
- p - polygon
- m - mesh
- MM - multimesh
- PA - particles
- c - circle
- t - transform
- CI - clip_ignore
- You may see "dummy" default batches containing no commands, you can ignore
- these.
- FAQ
- ~~~
- I don't get a large performance increase from switching on batching
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * Try the diagnostics, see how much batching is occurring, and whether it can be
- improved
- * Try changing parameters
- * Consider that batching may not be your bottleneck (see bottlenecks)
- I get a decrease in performance with batching
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * Try steps to increase batching given above
- * Try switching :ref:`single_rect_fallback
- <class_ProjectSettings_property_rendering/batching/options/single_rect_fallback>`
- to on
- * The single rect fallback method is the default used without batching, and it
- is approximately twice as fast, however it can result in flicker on some
- hardware, so its use is discouraged
- * After trying the above, if your scene is still performing worse, consider
- turning off batching.
- I use custom shaders and the items are not batching
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * Custom shaders can be problematic for batching, see the custom shaders section
- I am seeing line artifacts appear on certain hardware
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * See the :ref:`uv_contract
- <class_ProjectSettings_property_rendering/batching/precision/uv_contract>`
- project setting which can be used to solve this problem.
- I use a large number of textures, so few items are being batched
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- * Consider the use of texture atlases. As well as allowing batching, these
- reduce the need for state changes associated with changing texture.
- Appendix
- ~~~~~~~~
- Light scissoring threshold calculation
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The actual proportion of screen pixel area used as the threshold is the
- :ref:`scissor_area_threshold
- <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
- value to the power of 4.
- For example, on a screen size ``1920 x 1080`` there are ``2,073,600`` pixels.
- At a threshold of ``1000`` pixels, the proportion would be:
- ::
- 1000 / 2073600 = 0.00048225
- 0.00048225 ^ 0.25 = 0.14819
- .. note:: The power of 0.25 is the opposite of power of 4).
- So a :ref:`scissor_area_threshold
- <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
- of 0.15 would be a reasonable value to try.
- Going the other way, for instance with a :ref:`scissor_area_threshold
- <class_ProjectSettings_property_rendering/batching/lights/scissor_area_threshold>`
- of ``0.5``:
- ::
- 0.5 ^ 4 = 0.0625
- 0.0625 * 2073600 = 129600 pixels
- If the number of pixels saved is more than this threshold, the scissor is
- activated.
|