Browse Source

Added most of the information. Need to add timing diagram.

Signed-off-by: galibzon <[email protected]>
galibzon 1 year ago
parent
commit
a665353d41
1 changed files with 115 additions and 49 deletions
  1. 115 49
      rfcs/SubpassesSupportInRPI/RFC_SubpassesSupportInRPI.md

+ 115 - 49
rfcs/SubpassesSupportInRPI/RFC_SubpassesSupportInRPI.md

@@ -1,49 +1,115 @@
-Background History:
-Vulkan requires a valid VkRenderPass when creating a Pipeline State Object for any given shader.
-PSO creation usually occurs way earlier than when the RHI creates a VkRenderPass for RHI::Scopes.
-Vulkan "only" requires that both VkRenderPasses to be compatible, but they don't have to be same.
-
-Pseudo runtime example with Two INDEPENDENT VkRenderPasses:
-At time 1, during RPI::RasterPass Initialization:
-    Create dummy "VkRenderPass_A" for a shader named "PSO_A".
-    Create "PSO_A" using "VkRenderPass_A".
-At time 2, during another RPI::RasterPass Initialization:
-    Create dummy "VkRenderPass_B" for a shader named "PSO_B".
-    Create "PSO_B" using "VkRenderPass_B".
-At time 3, in the future, when the RHI builds the Frame Graph:
-    Create Scope_A with "VkRenderPass_C"
-    Create Scope_B with "VkRenderPass_D"
-At time 4, when the RHI submits commands
-    VkCmdBeginRenderPass (VkRenderPass_C)
-        VkCmdBindPipeline(PSO_A that was created with VkRenderPass_A)  // VkRenderPass_A must bepatible" with VkRenderPass_C.
-        VkCmdDraw(...)
-    VkCmdEndRenderPass (VkRenderPass_C)
-    VkCmdBeginRenderPass (VkRenderPass_D)
-        VkCmdBindPipeline(PSO_B that was created with VkRenderPass_B) // VkRenderPass_D must bepatible" with VkRenderPass_B.
-        VkCmdDraw(...)
-    VkCmdEndRenderPass (VkRenderPass_D)
-In the example above, because it is NOT using subpasses, it is relatively easy to make compatiblenderPasses.
-Also, this abstract class did NOT exist and WAS NOT necessario in that scenario.
-Subpasses is the mechanism by which Vulkan provides programmable controls to get maximum performance ofd Based GPUs.
-So, assuming the pipeline mentioned above can be merged as subpasses the timeline would look like this.
-At time 1, during RPI::RasterPass Initialization:
-    Create dummy "VkRenderPass_A" for a shader named "PSO_A".
-    Create "PSO_A" using "VkRenderPass_A".
-At time 2, during another RPI::RasterPass Initialization:
-    Create dummy "VkRenderPass_B" for a shader named "PSO_B".
-    Create "PSO_B" using "VkRenderPass_B".
-At time 3, in the future, when the RHI builds the Frame Graph:
-    Create Scope_A and Scope_B as subpasses with "VkRenderPass_C"
-At time 4, when the RHI submits commands
-    VkCmdBeginRenderPass (VkRenderPass_C)
-        VkCmdBindPipeline(PSO_A that was created with VkRenderPass_A)  // VkRenderPass_A must bepatible" with VkRenderPass_C.
-        VkCmdDraw(...)
-    VkCmdNextSubpass(1)
-        VkCmdBindPipeline(PSO_B that was created with VkRenderPass_B) // VkRenderPass_B must bepatible" with VkRenderPass_C
-        VkCmdDraw(...)
-    VkCmdEndRenderPass (VkRenderPass_C)
-When declaring Subpass Dependencies (Which is required during VkRenderPass creation), it was found thatan is very strict
-when validating Compatibility. In the end it boiled down to make sure that all VkRenderPasses must haveTICAL Subpass Dependencies
-otherwise vulkan would trigger validation errors. And this is why this abstract class was introduced.use it helps to encapsulate
-all the details required for creating VkSubpassDependency (for Vulkan) and share the exact same information creating all related
-VkRenderPasses.
+# A Solution To Expose Subpasses To The RPI
+
+**Why is this important?**  
+In terms of getting maximum graphics performance for the least amount of bandwidth on GPUs
+architected as Tile Based Rasterizers(TBR, for short) you need to group as many Raster Passes
+as possible into a series of Subpasses.
+
+Most Mobile and XR devices have TBR GPUs. Both bandwidth and power consumption should be minimized.
+Merging several Raster Passes as a sequence of Subpasses is one option available that typically achieves this goal. To learn more about Subpasses (In Vulkan):  
+1. https://arm-software.github.io/vulkan_best_practice_for_mobile_developers/samples/performance/render_subpasses/render_subpasses_tutorial.html  
+2. https://developer.qualcomm.com/sites/default/files/docs/adreno-gpu/snapdragon-game-toolkit/gdg/gpu/best_practices_tiling.html  
+3. https://www.saschawillems.de/blog/2018/07/19/vulkan-input-attachments-and-sub-passes/
+
+
+# O3DE Background History - Vulkan RHI
+In order to understand the proposed solution, it is very important to go over a few touch points on how the RPI and RHI work together to define the Frame Graph and how it gets executed.  
+  
+Each time the RPI initializes a Shader, it also instantiates a Pipeline State Object (aka PSO) for a given Shader (for a given Shader Variant to be precise). Vulkan requires a VkRenderPass to instantiate a PSO:  
+```
+// Internally, the RHI creates or re-uses a VkRenderPass to create ShaderPSO.
+ShaderPSO = Shader->AcquirePipelineState()
+```
+In the pseudo code mentioned above let's assume the Vulkan RHI created a private `VkRenderPass0`.
+
+ Later when the Vulkan RHI compiles the Scopes it creates a new `VkRenderPass1` and when submitting Draw commands the following pseudo code happens:
+```
+CmdBeginRenderPass(VkRenderPass1)
+    CmdBindPSO(ShaderPSO)
+    CmdDraw(...)
+CmdEndRenderPass() 
+```
+And here comes the first lesson, Vulkan does NOT require `VkRenderPass0` and `VkRenderPass1` to be identical, BUT they have to be **compatible**. Compatible usually means that both VkRenderPasses must be crated by referring to the exact same type and amount of Frame Attachments, and also means that **if there are Subpass Dependency declarations, those declaration must be identical**, otherwise validation errors or crashes may occur.  
+
+In the example above, `VkRenderPass0` is created with help of the API `AZ::RHI::Vulkan::RenderPass::ConvertLayoutAttachment()`, while `VkRenderPass1` is created by a different code path: `AZ::RHI::Vulkan::RenderPassBuilder`. The key takeaway is that because there was no subpass support exposed to the RPI it was relatively easy that two different code paths end up creating compatible VkRenderPasses.  
+
+Let's go over a slightly more complicated scenarion where two consecutive Raster Passes are executed by the RHI.
+Also let's assume each Raster Pass have their own shader:  
+```
+// Somewhere in the RPI at initialization time:
+
+// Shader used under Raster Pass A
+ShaderPSOA = ShaderA->AcquirePipelineState() // Internally VkRenderPassA is created.
+
+// Shader used under Raster Pass B
+ShaderPSOB = ShaderB->AcquirePipelineState() // Internally VkRenderPassB is created.
+... 
+A few seconds later
+...
+CmdBeginRenderPass(VkRenderPassC) //Scope of Raster Pass A
+    CmdBindPSO(ShaderPSOA)
+    CmdDraw(...)
+CmdEndRenderPass() 
+CmdBeginRenderPass(VkRenderPassD) //Scope of Raster Pass B
+    CmdBindPSO(ShaderPSOB)
+    CmdDraw(...)
+CmdEndRenderPass() 
+```
+The example aboved worked fine because `VkRenderPassC` was compatible with `VkRenderPassA`, and `VkRenderPassD` was compatible with `ShaderPSOB`.  
+  
+Let's go over the final example, where we assume that `Raster Pass A` & `Raster Pass B` are mergeable as subpasses:  
+```
+// Somewhere in the RPI at initialization time:
+
+// Shader used under Raster Pass A (as Subpass 0)
+ShaderPSOA = ShaderA->AcquirePipelineState() // Internally VkRenderPassA is created.
+
+// Shader used under Raster Pass B (as Subpass 1)
+ShaderPSOB = ShaderB->AcquirePipelineState() // Internally VkRenderPassB is created.
+... 
+A few seconds later
+...
+CmdBeginRenderPass(VkRenderPassC) //Merged Scope of Raster Pass A & B
+    CmdBindPSO(ShaderPSOA)
+    CmdDraw(...)
+CmdNextSubpass() //Scope of Raster Pass B
+    CmdBindPSO(ShaderPSOB)
+    CmdDraw(...)
+CmdEndRenderPass() 
+```
+For this final example to work well, all VkRenderPasses must be compatible: VkRenderPassA, VkRenderPassB and VkRenderPassC, and **most importantly** the VkSubpassDependency's must be identical. The challenge is that VkSubpassDependency is nothing more than a combination of Bit Flags and it can be tedious to write two different code paths that generate the exact same Bit Flags.  
+
+
+# The Solution
+  
+There are many considerations in this solution:  
+
+## Solution to Shareable Render Attachment Layouts
+So far each RPI::Pass has been constructing their Render Attachment Layouts in isolation (See AZ::RPI::RenderPass::GetRenderAttachmentConfiguration()) and it is precisely the Render Attachment Layout what contains all the data that describes Render Attachments and dependecies for each subpass.  
+The new change in the Pass asset is that `PassData` contains a new field called `"MergeChildrenAsSubpasses"`, which the `RPI::ParentPass` class can use to merge/combine a list of `RPI::RasterPass` as subpasses. The benefit of delegating the responbility to `RPI::ParentPass` is that it can sequentially build a single AZ::RHI::RenderAttachmentLayout for all the child passes that it is supposed to merge.  
+
+The major piece of work is encompassed by the new function:   `AZ::RPI::ParentPass::CreateRenderAttachmentConfigurationForSubpasses()`.  
+This function is called just after BuildInternal() is called for all Child passes.  
+The reason this is the best time to call this function is because the RPI has not called `AZ::RPI::RenderPass::GetRenderAttachmentConfiguration()` yet for any of the PSOs.  
+  
+Also  `AZ::RPI::RenderPass::GetRenderAttachmentConfiguration()` was changed to `virtual`. This allows RPI::RasterPass to override `GetRenderAttachmentConfiguration()` and returned the Render Attachment Configuration that was built with help of `RPI::ParentPass`.
+
+## Solution to Subpass Dependencies
+As mentioned already, it is a burden to have two different code paths, and guarantee that they both will create the exact same set of bitflags that make a set of VkSubpassDepency's. Another problem is that We need to have APIs that the RPI can use, which should decouple from the intricacies of the Vulkan RHI... Introducing `AZ::RHI::SubpassDependencies`. This abstract class will serve as an opaque handle returned by each RHI that supports Subpasses. In particular, for Vulkan, a concrete implementation will be `AZ::Vulkan::SubpassDependencies: public AZ::RHI::SubpassDependencies`.  
+
+The `RPI::ParentPass` will build the RenderAttachmentLayout for each subpass with the help of `RHI::RenderAttachmentLayoutBuilder` which in turn will invoke the NEW `AZ::Interface` named `AZ::RHI::SubpassDependenciesBuilderInterface` which provides the function:  
+```cpp
+virtual AZStd::shared_ptr<SubpassDependencies> BuildSubpassDependencies(const RHI::RenderAttachmentLayout& layout) const = 0;
+```
+The returned shared pointer will be shared by all Child Passes of type RPI::RasterPass, which they will later use when the FrameScheduler calls `RPI::Pass::SetupFrameGraphDependencies(RHI::FrameGraphInterface frameGraph)`.  
+RasterPasses that are being merged will call: 
+```cpp
+    if (m_subpassDependencies)
+    {
+        frameGraph.UseSubpassDependencies(m_subpassDependencies);
+    }
+```
+Deep down the line each `AZ::Vulkan::Scope` stores the shared pointer and the `AZ::Vulkan::RenderPassBuilder` would check if the shared pointer is valid and use it to create the VkRenderPass.  
+
+The following timeline diagram illustrate how the pieces fit together:  
+