compute_shaders.rst 10 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
  1. .. _doc_compute_shaders:
  2. Using compute shaders
  3. =====================
  4. Think of compute shaders as blocks of code that are executed on the GPU for any purpose we want.
  5. Compute shaders are independent from the graphics pipeline and do not have much fixed-functionality.
  6. Contrast this with fragment shaders which are used specifically for assigning a color to a fragment in a render target.
  7. The big benefit of compute shaders over code executed on a CPU is the high amount of parallelization that GPUs provide.
  8. Because compute shaders are independent of the graphics pipeline we don't have any user defined inputs or outputs
  9. (like a mesh going into the vertex shader or a texture coming out of a fragment shader). Instead, compute shaders
  10. make changes directly to memory stored on the GPU from which we can read and write using scripts.
  11. How they work
  12. -------------
  13. Compute shaders can be thought of as a mass of small computers called work groups.
  14. Much like super computers they are aligned in rows and columns but also stacked on top of each other
  15. essentially forming a 3D array of them.
  16. When creating a compute shader we can specify the number of work groups we wish to use.
  17. Keep in mind that these work groups are independent from each other and therefore can not depend on the results from other work groups.
  18. In each work group we have another 3D array of threads called invocations, but unlike work groups, invocations can communicate with each other. The number of invocations in each work group is specified inside the shader.
  19. So now lets work with a compute shader to see how it really works.
  20. Creating a ComputeShader
  21. ------------------------
  22. To begin using compute shaders, create a new text file called "compute_example.glsl". When you write compute shaders in Godot, you write them in GLSL directly. The Godot shader language is based off of GLSL so if you are familiar with normal shaders in Godot the syntax below will look somewhat familiar.
  23. Let's take a look at this compute shader code:
  24. .. code-block:: glsl
  25. #[compute]
  26. #version 450
  27. // Invocations in the (x, y, z) dimension
  28. layout(local_size_x = 2, local_size_y = 1, local_size_z = 1) in;
  29. // A binding to the buffer we create in our script
  30. layout(set = 0, binding = 0, std430) restrict buffer MyDataBuffer {
  31. double data[];
  32. }
  33. my_data_buffer;
  34. // The code we want to execute in each invocation
  35. void main() {
  36. // gl_GlobalInvocationID.x uniquely identifies this invocation across all work groups
  37. my_data_buffer.data[gl_GlobalInvocationID.x] *= 2.0;
  38. }
  39. This code takes an array of doubles, multiplies each element by 2 and store the results back in the buffer array.
  40. To continue copy the code above into your newly created "compute_example.glsl" file.
  41. Create a local RenderingDevice
  42. ------------------------------
  43. To interact and execute a compute shader we need a script. So go ahead and create a new script in the language of your choice and attach it to any Node in your scene.
  44. Now to execute our shader we need a local :ref:`RenderingDevice <class_RenderingDevice>` which can be created using the :ref:`RenderingServer <class_RenderingServer>`:
  45. .. tabs::
  46. .. code-tab:: gdscript GDScript
  47. # Create a local rendering device.
  48. var rd := RenderingServer.create_local_rendering_device()
  49. .. code-tab:: csharp
  50. // Create a local rendering device.
  51. var rd = RenderingServer.CreateLocalRenderingDevice();
  52. After that we can load the newly created shader file "compute_example.glsl" and create a pre-compiled version of it using this:
  53. .. tabs::
  54. .. code-tab:: gdscript GDScript
  55. # Load GLSL shader
  56. var shader_file := load("res://compute_example.glsl")
  57. var shader_spirv: RDShaderSPIRV = shader_file.get_spirv()
  58. var shader := rd.shader_create_from_spirv(shader_spirv)
  59. .. code-tab:: csharp
  60. // Load GLSL shader
  61. var shaderFile = GD.Load<RDShaderFile>("res://compute_example.glsl");
  62. var shaderBytecode = shaderFile.GetSpirv();
  63. var shader = rd.ShaderCreateFromSpirv(shaderBytecode);
  64. Provide input data
  65. ------------------
  66. As you might remember we want to pass an input array to our shader, multiply each element by 2 and get the results.
  67. To pass values to a compute shader we need to create a buffer. We are dealing with an array of doubles, so we will use a storage buffer for this example.
  68. A storage buffer takes an array of bytes and allows the CPU to transfer data to and from the GPU.
  69. So let's initialize an array of doubles and create a storage buffer:
  70. .. tabs::
  71. .. code-tab:: gdscript GDScript
  72. # Prepare our data. We use doubles in the shader, so we need 64 bit.
  73. var input := PackedFloat64Array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
  74. var input_bytes := input.to_byte_array()
  75. # Create a storage buffer that can hold our double values.
  76. # Each double has 8 byte (64 bit) so 10 x 8 = 80 bytes
  77. var buffer := rd.storage_buffer_create(input_bytes.size(), input_bytes)
  78. .. code-tab:: csharp
  79. // Prepare our data. We use doubles in the shader, so we need 64 bit.
  80. var input = new double[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
  81. var inputBytes = new byte[input.Length * sizeof(double)];
  82. Buffer.BlockCopy(input, 0, inputBytes, 0, inputBytes.Length);
  83. // Create a storage buffer that can hold our double values.
  84. // Each double has 8 byte (64 bit) so 10 x 8 = 80 bytes
  85. var buffer = rd.StorageBufferCreate((uint)inputBytes.Length, inputBytes);
  86. With the buffer in place we need to tell the rendering device to use this buffer.
  87. To do that we will need to create a uniform (like in normal shaders) and assign it to a uniform set which we can pass to our shader later.
  88. .. tabs::
  89. .. code-tab:: gdscript GDScript
  90. # Create a uniform to assign the buffer to the rendering device
  91. var uniform := RDUniform.new()
  92. uniform.uniform_type = RenderingDevice.UNIFORM_TYPE_STORAGE_BUFFER
  93. uniform.binding = 0 # this needs to match the "binding" in our shader file
  94. uniform.add_id(buffer)
  95. var uniform_set := rd.uniform_set_create([uniform], shader, 0) # the last parameter (the 0) needs to match the "set" in our shader file
  96. .. code-tab:: csharp
  97. // Create a uniform to assign the buffer to the rendering device
  98. var uniform = new RDUniform
  99. {
  100. UniformType = RenderingDevice.UniformType.StorageBuffer,
  101. Binding = 0
  102. };
  103. uniform.AddId(buffer);
  104. var uniformSet = rd.UniformSetCreate(new Array<RDUniform> { uniform }, shader, 0);
  105. Defining a compute pipeline
  106. ---------------------------
  107. The next step is to create a set of instructions our GPU can execute.
  108. We need a pipeline and a compute list for that.
  109. The steps we need to do to compute our result are:
  110. 1. Create a new pipeline.
  111. 2. Begin a list of instructions for our GPU to execute.
  112. 3. Bind our compute list to our pipeline
  113. 4. Bind our buffer uniform to our pipeline
  114. 5. Execute the logic of our shader
  115. 6. End the list of instructions
  116. .. tabs::
  117. .. code-tab:: gdscript GDScript
  118. # Create a compute pipeline
  119. var pipeline := rd.compute_pipeline_create(shader)
  120. var compute_list := rd.compute_list_begin()
  121. rd.compute_list_bind_compute_pipeline(compute_list, pipeline)
  122. rd.compute_list_bind_uniform_set(compute_list, uniform_set, 0)
  123. rd.compute_list_dispatch(compute_list, 5, 1, 1)
  124. rd.compute_list_end()
  125. .. code-tab:: csharp
  126. // Create a compute pipeline
  127. var pipeline = rd.ComputePipelineCreate(shader);
  128. var computeList = rd.ComputeListBegin();
  129. rd.ComputeListBindComputePipeline(computeList, pipeline);
  130. rd.ComputeListBindUniformSet(computeList, uniformSet, 0);
  131. rd.ComputeListDispatch(computeList, xGroups: 5, yGroups: 1, zGroups: 1);
  132. rd.ComputeListEnd();
  133. Note that we are dispatching the compute shader with 5 work groups in the x-axis, and one in the others.
  134. Since we have 2 local invocations in the x-axis (specified in our shader) 10 compute shader invocations will be launched in total.
  135. If you read or write to indices outside of the range of your buffer, you may access memory outside of your shaders control or parts of other variables which may cause issues on some hardware.
  136. Execute a compute shader
  137. ------------------------
  138. After all of this we are done, kind of.
  139. We still need to execute our pipeline, everything we did so far was only definition not execution.
  140. To execute our compute shader we just need to submit the pipeline to the GPU and wait for the execution to finish:
  141. .. tabs::
  142. .. code-tab:: gdscript GDScript
  143. # Submit to GPU and wait for sync
  144. rd.submit()
  145. rd.sync()
  146. .. code-tab:: csharp
  147. // Submit to GPU and wait for sync
  148. rd.Submit();
  149. rd.Sync();
  150. Ideally, you would not synchronize the RenderingDevice right away as it will cause the CPU to wait for the GPU to finish working. In our example we synchronize right away because we want our data available for reading right away. In general, you will want to wait at least a few frames before synchronizing so that the GPU is able to run in parellel with the CPU.
  151. Congratulations you created and executed a compute shader. But wait, where are the results now?
  152. Retrieving results
  153. -----------------
  154. You may remember from the beginning of this tutorial that compute shaders don't have inputs and outputs, they simply change memory. This means we can retrieve the data from our buffer we created at the start of this tutorial.
  155. The shader read from our array and stored the data in the same array again so our results are already there.
  156. Let's retrieve the data and print the results to our console.
  157. .. tabs::
  158. .. code-tab:: gdscript GDScript
  159. # Read back the data from the buffer
  160. var output_bytes := rd.buffer_get_data(buffer)
  161. var output := output_bytes.to_float64_array()
  162. print("Input: ", input)
  163. print("Output: ", output)
  164. .. code-tab:: csharp
  165. // Read back the data from the buffers
  166. var outputBytes = rd.BufferGetData(outputBuffer);
  167. var output = new double[input.Length];
  168. Buffer.BlockCopy(outputBytes, 0, output, 0, outputBytes.Length);
  169. GD.Print("Input: ", input)
  170. GD.Print("Output: ", output)
  171. Conclusion
  172. ----------
  173. Working with compute shaders is a little cumbersome to start, but once you have the basics working in your program you can scale up the complexity of your shader without making many changes to your script.