compute_shaders.rst 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347
  1. .. _doc_compute_shaders:
  2. Using compute shaders
  3. =====================
  4. This tutorial will walk you through the process of creating a minimal compute
  5. shader. But first, a bit of background on compute shaders and how they work with
  6. Godot.
  7. .. note::
  8. This tutorial assumes you are familiar with shaders generally. If you are new
  9. to shaders please read :ref:`doc_introduction_to_shaders` and :ref:`your
  10. first shader <toc-your-first-shader>` before proceeding with this tutorial.
  11. A compute shader is a special type of shader program that is orientated towards
  12. general purpose programming. In other words, they are more flexible than vertex
  13. shaders and fragment shaders as they don't have a fixed purpose (i.e.
  14. transforming vertices or writing colors to an image). Unlike fragment shaders
  15. and vertex shaders, compute shaders have very little going on behind the scenes. The code you write is what the GPU runs and very little else. This can make them
  16. a very useful tool to offload heavy calculations to the GPU.
  17. Now let's get started by creating a short compute shader.
  18. First, in the **external** text editor of your choice, create a new file called
  19. ``compute_example.glsl`` in your project folder. When you write compute shaders in Godot, you write them
  20. in GLSL directly. The Godot shader language is based on GLSL. If you are
  21. familiar with normal shaders in Godot, the syntax below will look somewhat
  22. familiar.
  23. .. note::
  24. Compute shaders can only be used from RenderingDevice-based renderers (the
  25. Forward+ or Mobile renderer). To follow along with this tutorial, ensure that
  26. you are using the Forward+ or Mobile renderer. The setting for which is
  27. located in the top right-hand corner of the editor.
  28. Let's take a look at this compute shader code:
  29. .. code-block:: glsl
  30. #[compute]
  31. #version 450
  32. // Invocations in the (x, y, z) dimension
  33. layout(local_size_x = 2, local_size_y = 1, local_size_z = 1) in;
  34. // A binding to the buffer we create in our script
  35. layout(set = 0, binding = 0, std430) restrict buffer MyDataBuffer {
  36. float data[];
  37. }
  38. my_data_buffer;
  39. // The code we want to execute in each invocation
  40. void main() {
  41. // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
  42. my_data_buffer.data[gl_GlobalInvocationID.x] *= 2.0;
  43. }
  44. This code takes an array of floats, multiplies each element by 2 and store the
  45. results back in the buffer array. Now let's look at it line-by-line.
  46. .. code-block:: glsl
  47. #[compute]
  48. #version 450
  49. These two lines communicate two things:
  50. 1. The following code is a compute shader. This is a Godot-specific hint that is needed for the editor to properly import the shader file.
  51. 2. The code is using GLSL version 450.
  52. You should never have to change these two lines for your custom compute shaders.
  53. .. code-block:: glsl
  54. // Invocations in the (x, y, z) dimension
  55. layout(local_size_x = 2, local_size_y = 1, local_size_z = 1) in;
  56. Next, we communicate the number of invocations to be used in each workgroup.
  57. Invocations are instances of the shader that are running within the same
  58. workgroup. When we launch a compute shader from the CPU, we tell it how many
  59. workgroups to run. Workgroups run in parallel to each other. While running one
  60. workgroup, you cannot access information in another workgroup. However,
  61. invocations in the same workgroup can have some limited access to other invocations.
  62. Think about workgroups and invocations as a giant nested ``for`` loop.
  63. .. code-block:: glsl
  64. for (int x = 0; x < workgroup_size_x; x++) {
  65. for (int y = 0; y < workgroup_size_y; y++) {
  66. for (int z = 0; z < workgroup_size_z; z++) {
  67. // Each workgroup runs independently and in parallel.
  68. for (int local_x = 0; local_x < invocation_size_x; local_x++) {
  69. for (int local_y = 0; local_y < invocation_size_y; local_y++) {
  70. for (int local_z = 0; local_z < invocation_size_z; local_z++) {
  71. // Compute shader runs here.
  72. }
  73. }
  74. }
  75. }
  76. }
  77. }
  78. Workgroups and invocations are an advanced topic. For now, remember that we will
  79. be running two invocations per workgroup.
  80. .. code-block:: glsl
  81. // A binding to the buffer we create in our script
  82. layout(set = 0, binding = 0, std430) restrict buffer MyDataBuffer {
  83. float data[];
  84. }
  85. my_data_buffer;
  86. Here we provide information about the memory that the compute shader will have
  87. access to. The ``layout`` property allows us to tell the shader where to look
  88. for the buffer, we will need to match these ``set`` and ``binding`` positions
  89. from the CPU side later.
  90. The ``restrict`` keyword tells the shader that this buffer is only going to be
  91. accessed from one place in this shader. In other words, we won't bind this
  92. buffer in another ``set`` or ``binding`` index. This is important as it allows
  93. the shader compiler to optimize the shader code. Always use ``restrict`` when
  94. you can.
  95. This is an *unsized* buffer, which means it can be any size. So we need to be
  96. careful not to read from an index larger than the size of the buffer.
  97. .. code-block:: glsl
  98. // The code we want to execute in each invocation
  99. void main() {
  100. // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
  101. my_data_buffer.data[gl_GlobalInvocationID.x] *= 2.0;
  102. }
  103. Finally, we write the ``main`` function which is where all the logic happens. We
  104. access a position in the storage buffer using the ``gl_GlobalInvocationID``
  105. built in variables. ``gl_GlobalInvocationID`` gives you the global unique ID for
  106. the current invocation.
  107. To continue, write the code above into your newly created ``compute_example.glsl``
  108. file.
  109. Create a local RenderingDevice
  110. ------------------------------
  111. To interact with and execute a compute shader, we need a script.
  112. Create a new script in the language of your choice and attach it to any Node
  113. in your scene.
  114. Now to execute our shader we need a local :ref:`class_RenderingDevice`
  115. which can be created using the :ref:`class_RenderingServer`:
  116. .. tabs::
  117. .. code-tab:: gdscript GDScript
  118. # Create a local rendering device.
  119. var rd := RenderingServer.create_local_rendering_device()
  120. .. code-tab:: csharp
  121. // Create a local rendering device.
  122. var rd = RenderingServer.CreateLocalRenderingDevice();
  123. After that, we can load the newly created shader file ``compute_example.glsl``
  124. and create a precompiled version of it using this:
  125. .. tabs::
  126. .. code-tab:: gdscript GDScript
  127. # Load GLSL shader
  128. var shader_file := load("res://compute_example.glsl")
  129. var shader_spirv: RDShaderSPIRV = shader_file.get_spirv()
  130. var shader := rd.shader_create_from_spirv(shader_spirv)
  131. .. code-tab:: csharp
  132. // Load GLSL shader
  133. var shaderFile = GD.Load<RDShaderFile>("res://compute_example.glsl");
  134. var shaderBytecode = shaderFile.GetSpirv();
  135. var shader = rd.ShaderCreateFromSpirv(shaderBytecode);
  136. Provide input data
  137. ------------------
  138. As you might remember, we want to pass an input array to our shader, multiply
  139. each element by 2 and get the results.
  140. We need to create a buffer to pass values to a compute shader. We are dealing
  141. with an array of floats, so we will use a storage buffer for this example. A
  142. storage buffer takes an array of bytes and allows the CPU to transfer data to
  143. and from the GPU.
  144. So let's initialize an array of floats and create a storage buffer:
  145. .. tabs::
  146. .. code-tab:: gdscript GDScript
  147. # Prepare our data. We use floats in the shader, so we need 32 bit.
  148. var input := PackedFloat32Array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  149. var input_bytes := input.to_byte_array()
  150. # Create a storage buffer that can hold our float values.
  151. # Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytes
  152. var buffer := rd.storage_buffer_create(input_bytes.size(), input_bytes)
  153. .. code-tab:: csharp
  154. // Prepare our data. We use floats in the shader, so we need 32 bit.
  155. var input = new float[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
  156. var inputBytes = new byte[input.Length * sizeof(float)];
  157. Buffer.BlockCopy(input, 0, inputBytes, 0, inputBytes.Length);
  158. // Create a storage buffer that can hold our float values.
  159. // Each float has 4 bytes (32 bit) so 10 x 4 = 40 bytes
  160. var buffer = rd.StorageBufferCreate((uint)inputBytes.Length, inputBytes);
  161. With the buffer in place we need to tell the rendering device to use this
  162. buffer. To do that we will need to create a uniform (like in normal shaders) and
  163. assign it to a uniform set which we can pass to our shader later.
  164. .. tabs::
  165. .. code-tab:: gdscript GDScript
  166. # Create a uniform to assign the buffer to the rendering device
  167. var uniform := RDUniform.new()
  168. uniform.uniform_type = RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFER
  169. uniform.binding = 0 # this needs to match the "binding" in our shader file
  170. uniform.add_id(buffer)
  171. var uniform_set := rd.uniform_set_create([uniform], shader, 0) # the last parameter (the 0) needs to match the "set" in our shader file
  172. .. code-tab:: csharp
  173. // Create a uniform to assign the buffer to the rendering device
  174. var uniform = new RDUniform
  175. {
  176. UniformType = RenderingDevice.UniformType.StorageBuffer,
  177. Binding = 0
  178. };
  179. uniform.AddId(buffer);
  180. var uniformSet = rd.UniformSetCreate(new Array<RDUniform> { uniform }, shader, 0);
  181. Defining a compute pipeline
  182. ---------------------------
  183. The next step is to create a set of instructions our GPU can execute.
  184. We need a pipeline and a compute list for that.
  185. The steps we need to do to compute our result are:
  186. 1. Create a new pipeline.
  187. 2. Begin a list of instructions for our GPU to execute.
  188. 3. Bind our compute list to our pipeline
  189. 4. Bind our buffer uniform to our pipeline
  190. 5. Specify how many workgroups to use
  191. 6. End the list of instructions
  192. .. tabs::
  193. .. code-tab:: gdscript GDScript
  194. # Create a compute pipeline
  195. var pipeline := rd.compute_pipeline_create(shader)
  196. var compute_list := rd.compute_list_begin()
  197. rd.compute_list_bind_compute_pipeline(compute_list, pipeline)
  198. rd.compute_list_bind_uniform_set(compute_list, uniform_set, 0)
  199. rd.compute_list_dispatch(compute_list, 5, 1, 1)
  200. rd.compute_list_end()
  201. .. code-tab:: csharp
  202. // Create a compute pipeline
  203. var pipeline = rd.ComputePipelineCreate(shader);
  204. var computeList = rd.ComputeListBegin();
  205. rd.ComputeListBindComputePipeline(computeList, pipeline);
  206. rd.ComputeListBindUniformSet(computeList, uniformSet, 0);
  207. rd.ComputeListDispatch(computeList, xGroups: 5, yGroups: 1, zGroups: 1);
  208. rd.ComputeListEnd();
  209. Note that we are dispatching the compute shader with 5 work groups in the
  210. X axis, and one in the others. Since we have 2 local invocations in the X axis
  211. (specified in our shader), 10 compute shader invocations will be launched in
  212. total. If you read or write to indices outside of the range of your buffer, you
  213. may access memory outside of your shaders control or parts of other variables
  214. which may cause issues on some hardware.
  215. Execute a compute shader
  216. ------------------------
  217. After all of this we are almost done, but we still need to execute our pipeline.
  218. So far we have only recorded what we would like the GPU to do; we have not
  219. actually run the shader program.
  220. To execute our compute shader we need to submit the pipeline to the GPU and
  221. wait for the execution to finish:
  222. .. tabs::
  223. .. code-tab:: gdscript GDScript
  224. # Submit to GPU and wait for sync
  225. rd.submit()
  226. rd.sync()
  227. .. code-tab:: csharp
  228. // Submit to GPU and wait for sync
  229. rd.Submit();
  230. rd.Sync();
  231. Ideally, you would not call ``sync()`` to synchronize the RenderingDevice right
  232. away as it will cause the CPU to wait for the GPU to finish working. In our
  233. example, we synchronize right away because we want our data available for reading
  234. right away. In general, you will want to wait *at least* 2 or 3 frames before
  235. synchronizing so that the GPU is able to run in parallel with the CPU.
  236. Retrieving results
  237. ------------------
  238. You may have noticed that, in the example shader, we modified the contents of the
  239. storage buffer. In other words, the shader read from our array and stored the data
  240. in the same array again so our results are already there. Let's retrieve
  241. the data and print the results to our console.
  242. .. tabs::
  243. .. code-tab:: gdscript GDScript
  244. # Read back the data from the buffer
  245. var output_bytes := rd.buffer_get_data(buffer)
  246. var output := output_bytes.to_float32_array()
  247. print("Input: ", input)
  248. print("Output: ", output)
  249. .. code-tab:: csharp
  250. // Read back the data from the buffers
  251. var outputBytes = rd.BufferGetData(outputBuffer);
  252. var output = new float[input.Length];
  253. Buffer.BlockCopy(outputBytes, 0, output, 0, outputBytes.Length);
  254. GD.Print("Input: ", input)
  255. GD.Print("Output: ", output)
  256. With that, you have everything you need to get started working with compute
  257. shaders.