|
@@ -1,12 +1,10 @@
|
|
-:article_outdated: True
|
|
|
|
-
|
|
|
|
.. _doc_gpu_optimization:
|
|
.. _doc_gpu_optimization:
|
|
|
|
|
|
GPU optimization
|
|
GPU optimization
|
|
================
|
|
================
|
|
|
|
|
|
Introduction
|
|
Introduction
|
|
-~~~~~~~~~~~~
|
|
|
|
|
|
+------------
|
|
|
|
|
|
The demand for new graphics features and progress almost guarantees that you
|
|
The demand for new graphics features and progress almost guarantees that you
|
|
will encounter graphics bottlenecks. Some of these can be on the CPU side, for
|
|
will encounter graphics bottlenecks. Some of these can be on the CPU side, for
|
|
@@ -25,16 +23,16 @@ more difficult to take measurements. In many cases, the only way of measuring
|
|
performance is by examining changes in the time spent rendering each frame.
|
|
performance is by examining changes in the time spent rendering each frame.
|
|
|
|
|
|
Draw calls, state changes, and APIs
|
|
Draw calls, state changes, and APIs
|
|
-===================================
|
|
|
|
|
|
+-----------------------------------
|
|
|
|
|
|
.. note:: The following section is not relevant to end-users, but is useful to
|
|
.. note:: The following section is not relevant to end-users, but is useful to
|
|
provide background information that is relevant in later sections.
|
|
provide background information that is relevant in later sections.
|
|
|
|
|
|
-Godot sends instructions to the GPU via a graphics API (OpenGL, OpenGL ES or
|
|
|
|
-Vulkan). The communication and driver activity involved can be quite costly,
|
|
|
|
-especially in OpenGL and OpenGL ES. If we can provide these instructions in a
|
|
|
|
-way that is preferred by the driver and GPU, we can greatly increase
|
|
|
|
-performance.
|
|
|
|
|
|
+Godot sends instructions to the GPU via a graphics API (Vulkan, OpenGL, OpenGL
|
|
|
|
+ES or WebGL). The communication and driver activity involved can be quite
|
|
|
|
+costly, especially in OpenGL, OpenGL ES and WebGL. If we can provide these
|
|
|
|
+instructions in a way that is preferred by the driver and GPU, we can greatly
|
|
|
|
+increase performance.
|
|
|
|
|
|
Nearly every API command in OpenGL requires a certain amount of validation to
|
|
Nearly every API command in OpenGL requires a certain amount of validation to
|
|
make sure the GPU is in the correct state. Even seemingly simple commands can
|
|
make sure the GPU is in the correct state. Even seemingly simple commands can
|
|
@@ -44,17 +42,21 @@ as much as possible so they can be rendered together, or with the minimum number
|
|
of these expensive state changes.
|
|
of these expensive state changes.
|
|
|
|
|
|
2D batching
|
|
2D batching
|
|
-~~~~~~~~~~~
|
|
|
|
|
|
+^^^^^^^^^^^
|
|
|
|
|
|
In 2D, the costs of treating each item individually can be prohibitively high -
|
|
In 2D, the costs of treating each item individually can be prohibitively high -
|
|
there can easily be thousands of them on the screen. This is why 2D *batching*
|
|
there can easily be thousands of them on the screen. This is why 2D *batching*
|
|
-is used. Multiple similar items are grouped together and rendered in a batch,
|
|
|
|
-via a single draw call, rather than making a separate draw call for each item.
|
|
|
|
-In addition, this means state changes, material and texture changes can be kept
|
|
|
|
-to a minimum.
|
|
|
|
|
|
+is used with OpenGL-based rendering methods. Multiple similar items are grouped
|
|
|
|
+together and rendered in a batch, via a single draw call, rather than making a
|
|
|
|
+separate draw call for each item. In addition, this means state changes,
|
|
|
|
+material and texture changes can be kept to a minimum.
|
|
|
|
+
|
|
|
|
+Vulkan-based rendering methods do not use 2D batching yet. Since draw calls are
|
|
|
|
+much cheaper with Vulkan compared to OpenGL, there is less of a need to have 2D
|
|
|
|
+batching (although it can still be beneficial in some cases).
|
|
|
|
|
|
3D batching
|
|
3D batching
|
|
-~~~~~~~~~~~
|
|
|
|
|
|
+^^^^^^^^^^^
|
|
|
|
|
|
In 3D, we still aim to minimize draw calls and state changes. However, it can be
|
|
In 3D, we still aim to minimize draw calls and state changes. However, it can be
|
|
more difficult to batch together several objects into a single draw call. 3D
|
|
more difficult to batch together several objects into a single draw call. 3D
|
|
@@ -62,7 +64,7 @@ meshes tend to comprise hundreds or thousands of triangles, and combining large
|
|
meshes in real-time is prohibitively expensive. The costs of joining them quickly
|
|
meshes in real-time is prohibitively expensive. The costs of joining them quickly
|
|
exceeds any benefits as the number of triangles grows per mesh. A much better
|
|
exceeds any benefits as the number of triangles grows per mesh. A much better
|
|
alternative is to **join meshes ahead of time** (static meshes in relation to each
|
|
alternative is to **join meshes ahead of time** (static meshes in relation to each
|
|
-other). This can either be done by artists, or programmatically within Godot.
|
|
|
|
|
|
+other). This can be done by artists, or programmatically within Godot using an add-on.
|
|
|
|
|
|
There is also a cost to batching together objects in 3D. Several objects
|
|
There is also a cost to batching together objects in 3D. Several objects
|
|
rendered as one cannot be individually culled. An entire city that is off-screen
|
|
rendered as one cannot be individually culled. An entire city that is off-screen
|
|
@@ -75,8 +77,8 @@ numbers of distant or low-poly objects.
|
|
For more information on 3D specific optimizations, see
|
|
For more information on 3D specific optimizations, see
|
|
:ref:`doc_optimizing_3d_performance`.
|
|
:ref:`doc_optimizing_3d_performance`.
|
|
|
|
|
|
-Reuse Shaders and Materials
|
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
+Reuse shaders and materials
|
|
|
|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
The Godot renderer is a little different to what is out there. It's designed to
|
|
The Godot renderer is a little different to what is out there. It's designed to
|
|
minimize GPU state changes as much as possible. :ref:`StandardMaterial3D
|
|
minimize GPU state changes as much as possible. :ref:`StandardMaterial3D
|
|
@@ -94,12 +96,12 @@ possible. Godot's priorities are:
|
|
that are enabled or disabled with a check box) even if they have different
|
|
that are enabled or disabled with a check box) even if they have different
|
|
parameters.
|
|
parameters.
|
|
|
|
|
|
-If a scene has, for example, ``20,000`` objects with ``20,000`` different
|
|
|
|
-materials each, rendering will be slow. If the same scene has ``20,000``
|
|
|
|
-objects, but only uses ``100`` materials, rendering will be much faster.
|
|
|
|
|
|
+If a scene has, for example, 20,000 objects with 20,000 different
|
|
|
|
+materials each, rendering will be slow. If the same scene has 20,000
|
|
|
|
+objects, but only uses 100 materials, rendering will be much faster.
|
|
|
|
|
|
Pixel cost versus vertex cost
|
|
Pixel cost versus vertex cost
|
|
-=============================
|
|
|
|
|
|
+-----------------------------
|
|
|
|
|
|
You may have heard that the lower the number of polygons in a model, the faster
|
|
You may have heard that the lower the number of polygons in a model, the faster
|
|
it will be rendered. This is *really* relative and depends on many factors.
|
|
it will be rendered. This is *really* relative and depends on many factors.
|
|
@@ -152,15 +154,17 @@ Pay attention to the additional vertex processing required when using:
|
|
|
|
|
|
- Skinning (skeletal animation)
|
|
- Skinning (skeletal animation)
|
|
- Morphs (shape keys)
|
|
- Morphs (shape keys)
|
|
-- Vertex-lit objects (common on mobile)
|
|
|
|
|
|
+
|
|
|
|
+.. Not implemented in Godot 4.x yet. Uncomment when this is implemented.
|
|
|
|
+ - Vertex-lit objects (common on mobile)
|
|
|
|
|
|
Pixel/fragment shaders and fill rate
|
|
Pixel/fragment shaders and fill rate
|
|
-====================================
|
|
|
|
|
|
+------------------------------------
|
|
|
|
|
|
In contrast to vertex processing, the costs of fragment (per-pixel) shading have
|
|
In contrast to vertex processing, the costs of fragment (per-pixel) shading have
|
|
-increased dramatically over the years. Screen resolutions have increased (the
|
|
|
|
|
|
+increased dramatically over the years. Screen resolutions have increased: the
|
|
area of a 4K screen is 8,294,400 pixels, versus 307,200 for an old 640×480 VGA
|
|
area of a 4K screen is 8,294,400 pixels, versus 307,200 for an old 640×480 VGA
|
|
-screen, that is 27x the area), but also the complexity of fragment shaders has
|
|
|
|
|
|
+screen. That is 27 times the area! Also, the complexity of fragment shaders has
|
|
exploded. Physically-based rendering requires complex calculations for each
|
|
exploded. Physically-based rendering requires complex calculations for each
|
|
fragment.
|
|
fragment.
|
|
|
|
|
|
@@ -190,7 +194,7 @@ their material to decrease the shading cost.
|
|
you can reasonably afford to use.**
|
|
you can reasonably afford to use.**
|
|
|
|
|
|
Reading textures
|
|
Reading textures
|
|
-~~~~~~~~~~~~~~~~
|
|
|
|
|
|
+^^^^^^^^^^^^^^^^
|
|
|
|
|
|
The other factor in fragment shaders is the cost of reading textures. Reading
|
|
The other factor in fragment shaders is the cost of reading textures. Reading
|
|
textures is an expensive operation, especially when reading from several
|
|
textures is an expensive operation, especially when reading from several
|
|
@@ -203,7 +207,7 @@ mobiles.
|
|
algorithms that require as few texture reads as possible.**
|
|
algorithms that require as few texture reads as possible.**
|
|
|
|
|
|
Texture compression
|
|
Texture compression
|
|
-~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
+^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
By default, Godot compresses textures of 3D models when imported using video RAM
|
|
By default, Godot compresses textures of 3D models when imported using video RAM
|
|
(VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
|
|
(VRAM) compression. Video RAM compression isn't as efficient in size as PNG or
|
|
@@ -227,9 +231,8 @@ textures with transparency (only opaque), so keep this in mind.
|
|
will negatively affect their appearance, without improving performance
|
|
will negatively affect their appearance, without improving performance
|
|
significantly due to their low resolution.
|
|
significantly due to their low resolution.
|
|
|
|
|
|
-
|
|
|
|
Post-processing and shadows
|
|
Post-processing and shadows
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
|
|
Post-processing effects and shadows can also be expensive in terms of fragment
|
|
Post-processing effects and shadows can also be expensive in terms of fragment
|
|
shading activity. Always test the impact of these on different hardware.
|
|
shading activity. Always test the impact of these on different hardware.
|
|
@@ -241,7 +244,7 @@ possible. Smaller or distant OmniLights/SpotLights can often have their shadows
|
|
disabled with only a small visual impact.
|
|
disabled with only a small visual impact.
|
|
|
|
|
|
Transparency and blending
|
|
Transparency and blending
|
|
-=========================
|
|
|
|
|
|
+-------------------------
|
|
|
|
|
|
Transparent objects present particular problems for rendering efficiency. Opaque
|
|
Transparent objects present particular problems for rendering efficiency. Opaque
|
|
objects (especially in 3D) can be essentially rendered in any order and the
|
|
objects (especially in 3D) can be essentially rendered in any order and the
|
|
@@ -266,7 +269,7 @@ very expensive. Indeed, in many situations, rendering more complex opaque
|
|
geometry can end up being faster than using transparency to "cheat".
|
|
geometry can end up being faster than using transparency to "cheat".
|
|
|
|
|
|
Multi-platform advice
|
|
Multi-platform advice
|
|
-=====================
|
|
|
|
|
|
+---------------------
|
|
|
|
|
|
If you are aiming to release on multiple platforms, test *early* and test
|
|
If you are aiming to release on multiple platforms, test *early* and test
|
|
*often* on all your platforms, especially mobile. Developing a game on desktop
|
|
*often* on all your platforms, especially mobile. Developing a game on desktop
|
|
@@ -278,7 +281,7 @@ to use the Compatibility rendering method for both desktop and mobile platforms
|
|
where you target both.
|
|
where you target both.
|
|
|
|
|
|
Mobile/tiled renderers
|
|
Mobile/tiled renderers
|
|
-======================
|
|
|
|
|
|
+----------------------
|
|
|
|
|
|
As described above, GPUs on mobile devices work in dramatically different ways
|
|
As described above, GPUs on mobile devices work in dramatically different ways
|
|
from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers
|
|
from GPUs on desktop. Most mobile devices use tile renderers. Tile renderers
|