123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263 |
- .. _doc_gpu_optimization:
- GPU Optimizations
- =================
- Introduction
- ~~~~~~~~~~~~
- The demand for new graphics features and progress almost guarantees that you
- will encounter graphics bottlenecks. Some of these can be CPU side, for instance
- in calculations inside the Godot engine to prepare objects for rendering.
- Bottlenecks can also occur on the CPU in the graphics driver, which sorts
- instructions to pass to the GPU, and in the transfer of these instructions. And
- finally bottlenecks also occur on the GPU itself.
- Where bottlenecks occur in rendering is highly hardware specific. Mobile GPUs in
- particular may struggle with scenes that run easily on desktop.
- Understanding and investigating GPU bottlenecks is slightly different to the
- situation on the CPU, because often you can only change performance indirectly,
- by changing the instructions you give to the GPU, and it may be more difficult
- to take measurements. Often the only way of measuring performance is by
- examining changes in frame rate.
- Drawcalls, state changes, and APIs
- ==================================
- .. note:: The following section is not relevant to end-users, but is useful to
- provide background information that is relevant in later sections.
- Godot sends instructions to the GPU via a graphics API (OpenGL, GLES2, GLES3,
- Vulkan). The communication and driver activity involved can be quite costly,
- especially in OpenGL. If we can provide these instructions in a way that is
- preferred by the driver and GPU, we can greatly increase performance.
- Nearly every API command in OpenGL requires a certain amount of validation, to
- make sure the GPU is in the correct state. Even seemingly simple commands can
- lead to a flurry of behind the scenes housekeeping. Therefore the name of the
- game is reduce these instructions to a bare minimum, and group together similar
- objects as much as possible so they can be rendered together, or with the
- minimum number of these expensive state changes.
- 2D batching
- ~~~~~~~~~~~
- In 2d, the costs of treating each item individually can be prohibitively high -
- there can easily be thousands on screen. This is why 2d batching is used -
- multiple similar items are grouped together and rendered in a batch, via a
- single drawcall, rather than making a separate drawcall for each item. In
- addition this means that state changes, material and texture changes can be kept
- to a minimum.
- For more information on 2D batching see :ref:`doc_batching`.
- 3D batching
- ~~~~~~~~~~~
- In 3d, we still aim to minimize draw calls and state changes, however, it can be
- more difficult to batch together several objects into a single draw call. 3d
- meshes tend to comprise hundreds or thousands of triangles, and combining large
- meshes at runtime is prohibitively expensive. The costs of joining them quickly
- exceeds any benefits as the number of triangles grows per mesh. A much better
- alternative is to join meshes ahead of time (static meshes in relation to each
- other). This can either be done by artists, or programmatically within Godot.
- There is also a cost to batching together objects in 3d. Several objects
- rendered as one cannot be individually culled. An entire city that is off screen
- will still be rendered if it is joined to a single blade of grass that is on
- screen. So attempting to batch together 3d objects should take account of their
- location and effect on culling. Despite this, the benefits of joining static
- objects often outweigh other considerations, especially for large numbers of low
- poly objects.
- For more information on 3D specific optimizations, see
- :ref:`doc_optimizing_3d_performance`.
- Reuse Shaders and Materials
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The Godot renderer is a little different to what is out there. It's designed to
- minimize GPU state changes as much as possible. :ref:`SpatialMaterial
- <class_SpatialMaterial>` does a good job at reusing materials that need similar
- shaders but, if custom shaders are used, make sure to reuse them as much as
- possible. Godot's priorities are:
- - **Reusing Materials**: The fewer different materials in the
- scene, the faster the rendering will be. If a scene has a huge amount
- of objects (in the hundreds or thousands) try reusing the materials
- or in the worst case use atlases.
- - **Reusing Shaders**: If materials can't be reused, at least try to
- re-use shaders (or SpatialMaterials with different parameters but the same
- configuration).
- If a scene has, for example, ``20,000`` objects with ``20,000`` different
- materials each, rendering will be slow. If the same scene has ``20,000``
- objects, but only uses ``100`` materials, rendering will be much faster.
- Pixel cost vs vertex cost
- =========================
- You may have heard that the lower the number of polygons in a model, the faster
- it will be rendered. This is *really* relative and depends on many factors.
- On a modern PC and console, vertex cost is low. GPUs originally only rendered
- triangles, so every frame all the vertices:
- 1. Had to be transformed by the CPU (including clipping).
- 2. Had to be sent to the GPU memory from the main RAM.
- Now all this is handled inside the GPU, so the performance is much higher. 3D
- artists usually have the wrong feeling about polycount performance because 3D
- DCCs (such as Blender, Max, etc.) need to keep geometry in CPU memory in order
- for it to be edited, reducing actual performance. Game engines rely on the GPU
- more so they can render many triangles much more efficiently.
- On mobile devices, the story is different. PC and Console GPUs are
- brute-force monsters that can pull as much electricity as they need from
- the power grid. Mobile GPUs are limited to a tiny battery, so they need
- to be a lot more power efficient.
- To be more efficient, mobile GPUs attempt to avoid *overdraw*. This means, the
- same pixel on the screen being rendered more than once. Imagine a town with
- several buildings, GPUs don't know what is visible and what is hidden until they
- draw it. A house might be drawn and then another house in front of it (rendering
- happened twice for the same pixel!). PC GPUs normally don't care much about this
- and just throw more pixel processors to the hardware to increase performance
- (but this also increases power consumption).
- Using more power is not an option on mobile so mobile devices use a technique
- called "Tile Based Rendering" which divides the screen into a grid. Each cell
- keeps the list of triangles drawn to it and sorts them by depth to minimize
- *overdraw*. This technique improves performance and reduces power consumption,
- but takes a toll on vertex performance. As a result, fewer vertices and
- triangles can be processed for drawing.
- Additionally, Tile Based Rendering struggles when there are small objects with a
- lot of geometry within a small portion of the screen. This forces mobile GPUs to
- put a lot of strain on a single screen tile which considerably decreases
- performance as all the other cells must wait for it to complete in order to
- display the frame.
- In summary, do not worry about vertex count on mobile, but avoid concentration
- of vertices in small parts of the screen. If a character, NPC, vehicle, etc. is
- far away (so it looks tiny), use a smaller level of detail (LOD) model.
- Pay attention to the additional vertex processing required when using:
- - Skinning (skeletal animation)
- - Morphs (shape keys)
- - Vertex-lit objects (common on mobile)
- Pixel / fragment shaders - fill rate
- ====================================
- In contrast to vertex processing, the costs of fragment shading has increased
- dramatically over the years. Screen resolutions have increased (the area of a 4K
- screen is ``8,294,400`` pixels, versus ``307,200`` for an old ``640x480`` VGA
- screen, that is 27x the area), but also the complexity of fragment shaders has
- exploded. Physically based rendering requires complex calculations for each
- fragment.
- You can test whether a project is fill rate limited quite easily. Turn off vsync
- to prevent capping the frames per second, then compare the frames per second
- when running with a large window, to running with a postage stamp sized window
- (you may also benefit from similarly reducing your shadow map size if using
- shadows). Usually you will find the fps increases quite a bit using a small
- window, which indicates you are to some extent fill rate limited. If on the
- other hand there is little to no increase in fps, then your bottleneck lies
- elsewhere.
- You can increase performance in a fill rate limited project by reducing the
- amount of work the GPU has to do. You can do this by simplifying the shader
- (perhaps turn off expensive options if you are using a :ref:`SpatialMaterial
- <class_SpatialMaterial>`), or reducing the number and size of textures used.
- Consider shipping simpler shaders for mobile.
- Reading textures
- ~~~~~~~~~~~~~~~~
- The other factor in fragment shaders is the cost of reading textures. Reading
- textures is an expensive operation (especially reading from several in a single
- fragment shader), and also consider the filtering may add expense to this
- (trilinear filtering between mipmaps, and averaging). Reading textures is also
- expensive in power terms, which is a big issue on mobiles.
- Texture compression
- ~~~~~~~~~~~~~~~~~~~
- Godot compresses textures of 3D models when imported (VRAM compression) by
- default. Video RAM compression is not as efficient in size as PNG or JPG when
- stored, but increases performance enormously when drawing.
- This is because the main goal of texture compression is bandwidth reduction
- between memory and the GPU.
- In 3D, the shapes of objects depend more on the geometry than the texture, so
- compression is generally not noticeable. In 2D, compression depends more on
- shapes inside the textures, so the artifacts resulting from 2D compression are
- more noticeable.
- As a warning, most Android devices do not support texture compression of
- textures with transparency (only opaque), so keep this in mind.
- Post processing / shadows
- ~~~~~~~~~~~~~~~~~~~~~~~~~
- Post processing effects and shadows can also be expensive in terms of fragment
- shading activity. Always test the impact of these on different hardware.
- Reducing the size of shadow maps can increase performance, both in terms of
- writing, and reading the maps.
- Transparency / blending
- =======================
- Transparent items present particular problems for rendering efficiency. Opaque
- items (especially in 3d) can be essentially rendered in any order and the
- Z-buffer will ensure that only the front most objects get shaded. Transparent or
- blended objects are different - in most cases they cannot rely on the Z-buffer
- and must be rendered in "painter's order" (i.e. from back to front) in order to
- look correct.
- The transparent items are also particularly bad for fill rate, because every
- item has to be drawn, even if later transparent items will be drawn on top.
- Opaque items don't have to do this. They can usually take advantage of the
- Z-buffer by writing to the Z-buffer only first, then only performing the
- fragment shader on the 'winning' fragment, the item that is at the front at a
- particular pixel.
- Transparency is particularly expensive where multiple transparent items overlap.
- It is usually better to use as small a transparent area as possible in order to
- minimize these fill rate requirements, especially on mobile, where fill rate is
- very expensive. Indeed, in many situations, rendering more complex opaque
- geometry can end up being faster than using transparency to "cheat".
- Multi-Platform Advice
- =====================
- If you are aiming to release on multiple platforms, test `early` and test
- `often` on all your platforms, especially mobile. Developing a game on desktop
- but attempting to port to mobile at the last minute is a recipe for disaster.
- In general you should design your game for the lowest common denominator, then
- add optional enhancements for more powerful platforms. For example, you may want
- to use the GLES2 backend for both desktop and mobile platforms where you target
- both.
- Mobile / tile renderers
- =======================
- GPUs on mobile devices work in dramatically different ways from GPUs on desktop.
- Most mobile devices use tile renderers. Tile renderers split up the screen into
- regular sized tiles that fit into super fast cache memory, and reduce the reads
- and writes to main memory.
- There are some downsides though, it can make certain techniques much more
- complicated and expensive to perform. Tiles that rely on the results of
- rendering in different tiles or on the results of earlier operations being
- preserved can be very slow. Be very careful to test the performance of shaders,
- viewport textures and post processing.
|