| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289 | .. _doc_gpu_optimization:GPU optimization================Introduction------------The demand for new graphics features and progress almost guarantees that youwill encounter graphics bottlenecks. Some of these can be on the CPU side, forinstance in calculations inside the Godot engine to prepare objects forrendering. Bottlenecks can also occur on the CPU in the graphics driver, whichsorts instructions to pass to the GPU, and in the transfer of theseinstructions. And finally, bottlenecks also occur on the GPU itself.Where bottlenecks occur in rendering is highly hardware-specific.Mobile GPUs in particular may struggle with scenes that run easily on desktop.Understanding and investigating GPU bottlenecks is slightly different to thesituation on the CPU. This is because, often, you can only change performanceindirectly by changing the instructions you give to the GPU. Also, it may bemore difficult to take measurements. In many cases, the only way of measuringperformance is by examining changes in the time spent rendering each frame.Draw calls, state changes, and APIs-----------------------------------.. note:: The following section is not relevant to end-users, but is useful to          provide background information that is relevant in later sections.Godot sends instructions to the GPU via a graphics API (Vulkan, OpenGL, OpenGLES or WebGL). The communication and driver activity involved can be quitecostly, especially in OpenGL, OpenGL ES and WebGL. If we can provide theseinstructions in a way that is preferred by the driver and GPU, we can greatlyincrease performance.Nearly every API command in OpenGL requires a certain amount of validation tomake sure the GPU is in the correct state. Even seemingly simple commands canlead to a flurry of behind-the-scenes housekeeping. Therefore, the goal is toreduce these instructions to a bare minimum and group together similar objectsas much as possible so they can be rendered together, or with the minimum numberof these expensive state changes.2D batching~~~~~~~~~~~In 2D, the costs of treating each item individually can be prohibitively high -there can easily be thousands of them on the screen. This is why 2D *batching*is used. Multiple similar items are groupedtogether and rendered in a batch, via a single draw call, rather than making aseparate draw call for each item. In addition, this means state changes,material and texture changes can be kept to a minimum.3D batching~~~~~~~~~~~In 3D, we still aim to minimize draw calls and state changes. However, it can bemore difficult to batch together several objects into a single draw call. 3Dmeshes tend to comprise hundreds or thousands of triangles, and combining largemeshes in real-time is prohibitively expensive. The costs of joining them quicklyexceeds any benefits as the number of triangles grows per mesh. A much betteralternative is to **join meshes ahead of time** (static meshes in relation to eachother). This can be done by artists, or programmatically within Godot using an add-on.There is also a cost to batching together objects in 3D. Several objectsrendered as one cannot be individually culled. An entire city that is off-screenwill still be rendered if it is joined to a single blade of grass that is onscreen. Thus, you should always take objects' locations and culling into accountwhen attempting to batch 3D objects together. Despite this, the benefits ofjoining static objects often outweigh other considerations, especially for largenumbers of distant or low-poly objects.For more information on 3D specific optimizations, see:ref:`doc_optimizing_3d_performance`.Reuse shaders and materials~~~~~~~~~~~~~~~~~~~~~~~~~~~The Godot renderer is a little different to what is out there. It's designed tominimize GPU state changes as much as possible. :ref:`StandardMaterial3D<class_StandardMaterial3D>` does a good job at reusing materials that need similarshaders. If custom shaders are used, make sure to reuse them as much aspossible. Godot's priorities are:-  **Reusing Materials:** The fewer different materials in the   scene, the faster the rendering will be. If a scene has a huge amount   of objects (in the hundreds or thousands), try reusing the materials.   In the worst case, use atlases to decrease the amount of texture changes.-  **Reusing Shaders:** If materials can't be reused, at least try to reuse   shaders. Note: shaders are automatically reused between   StandardMaterial3Ds that share the same configuration (features   that are enabled or disabled with a check box) even if they have different   parameters.If a scene has, for example, 20,000 objects with 20,000 differentmaterials each, rendering will be slow. If the same scene has 20,000objects, but only uses 100 materials, rendering will be much faster.Pixel cost versus vertex cost-----------------------------You may have heard that the lower the number of polygons in a model, the fasterit will be rendered. This is *really* relative and depends on many factors.On a modern PC and console, vertex cost is low. GPUs originally only renderedtriangles. This meant that every frame:1. All vertices had to be transformed by the CPU (including clipping).2. All vertices had to be sent to the GPU memory from the main RAM.Nowadays, all this is handled inside the GPU, greatly increasing performance. 3Dartists usually have the wrong feeling about polycount performance because 3Dmodeling software (such as Blender, 3ds Max, etc.) need to keep geometry in CPUmemory for it to be edited, reducing actual performance. Game engines rely onthe GPU more, so they can render many triangles much more efficiently.On mobile devices, the story is different. PC and console GPUs arebrute-force monsters that can pull as much electricity as they need fromthe power grid. Mobile GPUs are limited to a tiny battery, so they needto be a lot more power efficient.To be more efficient, mobile GPUs attempt to avoid *overdraw*. Overdraw occurswhen the same pixel on the screen is being rendered more than once. Imagine atown with several buildings. GPUs don't know what is visible and what is hiddenuntil they draw it. For example, a house might be drawn and then another housein front of it (which means rendering happened twice for the same pixel). PCGPUs normally don't care much about this and just throw more pixel processors tothe hardware to increase performance (which also increases power consumption).Using more power is not an option on mobile so mobile devices use a techniquecalled *tile-based rendering* which divides the screen into a grid. Each cellkeeps the list of triangles drawn to it and sorts them by depth to minimize*overdraw*. This technique improves performance and reduces power consumption,but takes a toll on vertex performance. As a result, fewer vertices andtriangles can be processed for drawing.Additionally, tile-based rendering struggles when there are small objects with alot of geometry within a small portion of the screen. This forces mobile GPUs toput a lot of strain on a single screen tile, which considerably decreasesperformance as all the other cells must wait for it to complete beforedisplaying the frame.To summarize, don't worry about vertex count on mobile, but**avoid concentration of vertices in small parts of the screen**.If a character, NPC, vehicle, etc. is far away (which means it looks tiny), usea smaller level of detail (LOD) model. Even on desktop GPUs, it's preferable toavoid having triangles smaller than the size of a pixel on screen.Pay attention to the additional vertex processing required when using:-  Skinning (skeletal animation)-  Morphs (shape keys)-  Vertex-lit objects (common on mobile)Pixel/fragment shaders and fill rate------------------------------------In contrast to vertex processing, the costs of fragment (per-pixel) shading haveincreased dramatically over the years. Screen resolutions have increased: thearea of a 4K screen is 8,294,400 pixels, versus 307,200 for an old 640×480 VGAscreen. That is 27 times the area! Also, the complexity of fragment shaders hasexploded. Physically-based rendering requires complex calculations for eachfragment.You can test whether a project is fill rate-limited quite easily. Turn offV-Sync to prevent capping the frames per second, then compare the frames persecond when running with a large window, to running with a very small window.You may also benefit from similarly reducing your shadow map size if usingshadows. Usually, you will find the FPS increases quite a bit using a smallwindow, which indicates you are to some extent fill rate-limited. On the otherhand, if there is little to no increase in FPS, then your bottleneck lieselsewhere.You can increase performance in a fill rate-limited project by reducing theamount of work the GPU has to do. You can do this by simplifying the shader(perhaps turn off expensive options if you are using a :ref:`StandardMaterial3D<class_StandardMaterial3D>`), or reducing the number and size of textures used.Also, when using non-unshaded particles, consider forcing vertex shading intheir material to decrease the shading cost... seealso::    On supported hardware, :ref:`doc_variable_rate_shading` can be used to    reduce shading processing costs without impacting the sharpness of edges on    the final image.**When targeting mobile devices, consider using the simplest possible shadersyou can reasonably afford to use.**Reading textures~~~~~~~~~~~~~~~~The other factor in fragment shaders is the cost of reading textures. Readingtextures is an expensive operation, especially when reading from severaltextures in a single fragment shader. Also, consider that filtering may slow itdown further (trilinear filtering between mipmaps, and averaging). Readingtextures is also expensive in terms of power usage, which is a big issue onmobiles.**If you use third-party shaders or write your own shaders, try to usealgorithms that require as few texture reads as possible.**Texture compression~~~~~~~~~~~~~~~~~~~By default, Godot compresses textures of 3D models when imported using video RAM(VRAM) compression. Video RAM compression isn't as efficient in size as PNG orJPG when stored, but increases performance enormously when drawing large enoughtextures.This is because the main goal of texture compression is bandwidth reductionbetween memory and the GPU.In 3D, the shapes of objects depend more on the geometry than the texture, socompression is generally not noticeable. In 2D, compression depends more onshapes inside the textures, so the artifacts resulting from 2D compression aremore noticeable.As a warning, most Android devices do not support texture compression oftextures with transparency (only opaque), so keep this in mind... note::   Even in 3D, "pixel art" textures should have VRAM compression disabled as it   will negatively affect their appearance, without improving performance   significantly due to their low resolution.Post-processing and shadows~~~~~~~~~~~~~~~~~~~~~~~~~~~Post-processing effects and shadows can also be expensive in terms of fragmentshading activity. Always test the impact of these on different hardware.**Reducing the size of shadowmaps can increase performance**, both in terms ofwriting and reading the shadowmaps. On top of that, the best way to improveperformance of shadows is to turn shadows off for as many lights and objects aspossible. Smaller or distant OmniLights/SpotLights can often have their shadowsdisabled with only a small visual impact.Transparency and blending-------------------------Transparent objects present particular problems for rendering efficiency. Opaqueobjects (especially in 3D) can be essentially rendered in any order and theZ-buffer will ensure that only the front most objects get shaded. Transparent orblended objects are different. In most cases, they cannot rely on the Z-bufferand must be rendered in "painter's order" (i.e. from back to front) to lookcorrect.Transparent objects are also particularly bad for fill rate, because every itemhas to be drawn even if other transparent objects will be drawn on toplater on.Opaque objects don't have to do this. They can usually take advantage of theZ-buffer by writing to the Z-buffer only first, then only performing thefragment shader on the "winning" fragment, the object that is at the front at aparticular pixel.Transparency is particularly expensive where multiple transparent objectsoverlap. It is usually better to use transparent areas as small as possible tominimize these fill rate requirements, especially on mobile, where fill rate isvery expensive. Indeed, in many situations, rendering more complex opaquegeometry can end up being faster than using transparency to "cheat".Multi-platform advice---------------------If you are aiming to release on multiple platforms, test *early* and test*often* on all your platforms, especially mobile. Developing a game on desktopbut attempting to port it to mobile at the last minute is a recipe for disaster.In general, you should design your game for the lowest common denominator, thenadd optional enhancements for more powerful platforms. For example, you may wantto use the Compatibility rendering method for both desktop and mobile platformswhere you target both.Mobile/tiled renderers----------------------As described above, GPUs on mobile devices work in dramatically different waysfrom GPUs on desktop. Most mobile devices use tile renderers. Tile rendererssplit up the screen into regular-sized tiles that fit into super fast cachememory, which reduces the number of read/write operations to the main memory.There are some downsides though. Tiled rendering can make certain techniquesmuch more complicated and expensive to perform. Tiles that rely on the resultsof rendering in different tiles or on the results of earlier operations beingpreserved can be very slow. Be very careful to test the performance of shaders,viewport textures and post processing.
 |