فهرست منبع

Add partial derivative test cases for pixel shaders (#220)

This change is to add another execution test case for partial derivative operations specific to pixel shaders.
The way we did this is by passing in the texture resource to the shader and taking the partial derivatives of these texture values.
Test is assuming the arithmetic precision of 1 ulp for derivative operations.

This change also has some changes on existing execution test
 - enable passing in Texture2D resource type as a default heap
 - reading primitive topology from XML file
 - clean up data driven tests: use DirectX Math structures and removing unused structures
Young Kim 8 سال پیش
والد
کامیت
c50dd745d2

+ 42 - 4
docs/DXIL.rst

@@ -1985,10 +1985,10 @@ ID  Name                           Description
 80  Barrier_                       inserts a memory barrier in the shader
 81  CalculateLOD_                  calculates the level of detail
 82  Discard_                       discard the current pixel
-83  DerivCoarseX_                  computes the rate of change of components per stamp
-84  DerivCoarseY_                  computes the rate of change of components per stamp
-85  DerivFineX_                    computes the rate of change of components per pixel
-86  DerivFineY_                    computes the rate of change of components per pixel
+83  DerivCoarseX_                  computes the rate of change per stamp in x direction.
+84  DerivCoarseY_                  computes the rate of change per stamp in y direction.
+85  DerivFineX_                    computes the rate of change per pixel in x direction.
+86  DerivFineY_                    computes the rate of change per pixel in y direction.
 87  EvalSnapped_                   evaluates an input attribute at pixel center with an offset
 88  EvalSampleIndex_               evaluates an input attribute at a sample location
 89  EvalCentroid_                  evaluates an input attribute at pixel center
@@ -2109,6 +2109,44 @@ Countbits
 
 Counts the number of bits in the input integer.
 
+DerivCoarseX
+~~~~~~~~~~~~
+
+dst = DerivCoarseX(src);
+
+Computes the rate of change per stamp in x direction. Only a single x derivative pair is computed for each 2x2 stamp of pixels.
+The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
+As an example, the x derivative could be a delta from the top row of pixels.
+The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+DerivCoarseY
+~~~~~~~~~~~~
+
+dst = DerivCoarseY(src);
+
+Computes the rate of change per stamp in y direction. Only a single y derivative pair is computed for each 2x2 stamp of pixels.
+The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
+As an example, the y derivative could be a delta from the left column of pixels.
+The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+DerivFineX
+~~~~~~~~~~
+
+dst = DerivFineX(src);
+
+Computes the rate of change per pixel in x direction. Each pixel in the 2x2 stamp gets a unique pair of x derivative calculations
+The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
+There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+DerivFineY
+~~~~~~~~~~
+
+dst = DerivFineY(src);
+
+Computes the rate of change per pixel in y direction. Each pixel in the 2x2 stamp gets a unique pair of y derivative calculations
+The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
+There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
 Dot2
 ~~~~
 

+ 4 - 4
include/dxc/HLSL/DxilConstants.h

@@ -326,10 +326,10 @@ namespace DXIL {
     // Pixel shader
     CalculateLOD = 81, // calculates the level of detail
     Coverage = 91, // returns the coverage mask input in a pixel shader
-    DerivCoarseX = 83, // computes the rate of change of components per stamp
-    DerivCoarseY = 84, // computes the rate of change of components per stamp
-    DerivFineX = 85, // computes the rate of change of components per pixel
-    DerivFineY = 86, // computes the rate of change of components per pixel
+    DerivCoarseX = 83, // computes the rate of change per stamp in x direction.
+    DerivCoarseY = 84, // computes the rate of change per stamp in y direction.
+    DerivFineX = 85, // computes the rate of change per pixel in x direction.
+    DerivFineY = 86, // computes the rate of change per pixel in y direction.
     Discard = 82, // discard the current pixel
     EvalCentroid = 89, // evaluates an input attribute at pixel center
     EvalSampleIndex = 88, // evaluates an input attribute at a sample location

+ 4 - 4
include/dxc/HLSL/DxilInstructions.h

@@ -2364,7 +2364,7 @@ struct DxilInst_Discard {
   llvm::Value *get_condition() const { return Instr->getOperand(1); }
 };
 
-/// This instruction computes the rate of change of components per stamp
+/// This instruction computes the rate of change per stamp in x direction.
 struct DxilInst_DerivCoarseX {
   const llvm::Instruction *Instr;
   // Construction and identification
@@ -2382,7 +2382,7 @@ struct DxilInst_DerivCoarseX {
   llvm::Value *get_value() const { return Instr->getOperand(1); }
 };
 
-/// This instruction computes the rate of change of components per stamp
+/// This instruction computes the rate of change per stamp in y direction.
 struct DxilInst_DerivCoarseY {
   const llvm::Instruction *Instr;
   // Construction and identification
@@ -2400,7 +2400,7 @@ struct DxilInst_DerivCoarseY {
   llvm::Value *get_value() const { return Instr->getOperand(1); }
 };
 
-/// This instruction computes the rate of change of components per pixel
+/// This instruction computes the rate of change per pixel in x direction.
 struct DxilInst_DerivFineX {
   const llvm::Instruction *Instr;
   // Construction and identification
@@ -2418,7 +2418,7 @@ struct DxilInst_DerivFineX {
   llvm::Value *get_value() const { return Instr->getOperand(1); }
 };
 
-/// This instruction computes the rate of change of components per pixel
+/// This instruction computes the rate of change per pixel in y direction.
 struct DxilInst_DerivFineY {
   const llvm::Instruction *Instr;
   // Construction and identification

+ 68 - 0
tools/clang/test/HLSL/ShaderOpArith.xml

@@ -1,5 +1,73 @@
 <?xml version="1.0" encoding="utf-8" standalone="yes"?>
 <ShaderOpSet xmlns="http://schemas.microsoft.com/test/ShaderOp">
+  <ShaderOp Name="DerivFine" PS="PS" VS="VS" TopologyType="TRIANGLE">
+    <RootSignature>RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT), DescriptorTable(SRV(t0,numDescriptors=1))</RootSignature>
+    <Resource Name="VBuffer" Dimension="BUFFER" InitialResourceState="COPY_DEST" Init="FromBytes" Topology="TRIANGLELIST">
+      { { -1.0f, 1.0f, 0.0f }, { 0.0f, 0.0f } },
+      { { 1.0f, 1.0f, 0.0f }, { 1.0f, 0.0f } },
+      { { -1.0f, -1.0f, 0.0f }, { 0.0f, 1.0f } },
+
+      { { -1.0f, -1.0f, 0.0f }, { 0.0f, 1.0f } },
+      { { 1.0f, 1.0f, 0.0f }, { 1.0f, 0.0f } },
+      { { 1.0f, -1.0f, 0.0f }, { 1.0f, 1.0f } }
+    </Resource>
+    <Resource Name="T0" Dimension="Texture2D" Width="4" Height="4" InitialResourceState="COPY_DEST" Init="FromBytes" Format="R32_FLOAT">
+      {.125f, .25f, .5f, 1.0f},
+      {2.0f, 4.0f, 16.0f, 32.0f},
+      {32.0f, 64.0f, 128.0f, 256.0f},
+      {256.0f, 512.0f, 1024.0f, 2048.0f}
+    </Resource>
+    <Resource Name="RTarget" Dimension="TEXTURE2D" Width="64" Height="64" Format="R32G32B32A32_FLOAT" Flags="ALLOW_RENDER_TARGET" InitialResourceState="COPY_DEST" ReadBack="true" />
+
+    <RootValues>
+      <RootValue HeapName="ResHeap" />
+    </RootValues>
+    <DescriptorHeap Name="ResHeap" Type="CBV_SRV_UAV">
+      <Descriptor Name='T0' Kind='SRV' ResName='T0' />
+    </DescriptorHeap>
+    <DescriptorHeap Name="RtvHeap" NumDescriptors="1" Type="RTV">
+      <Descriptor Name="RTarget" Kind="RTV"/>
+    </DescriptorHeap>
+
+    <InputElements>
+      <InputElement SemanticName="POSITION" Format="R32G32B32_FLOAT" AlignedByteOffset="0" />
+      <InputElement SemanticName="TEXCOORD" Format="R32G32_FLOAT" AlignedByteOffset="12" />
+    </InputElements>
+    <RenderTargets>
+      <RenderTarget Name="RTarget"/>
+    </RenderTargets>
+    <Shader Name="VS" Target="vs_6_0">
+      <![CDATA[
+        struct PSInput {
+          float4 position : SV_POSITION;
+          float2 uv : TEXCOORD;
+        };
+        PSInput main(float3 position : POSITION, float2 uv : TEXCOORD) {
+          PSInput result;
+          result.position = float4(position, 1.0);
+          result.uv = uv;
+          return result;
+        }
+      ]]>
+    </Shader>
+    <Shader Name="PS" Target="ps_6_0">
+      <![CDATA[
+      struct PSInput {
+        float4 position : SV_POSITION;
+        float2 uv : TEXCOORD;
+      };
+
+      Texture2D<float> g_tex : register(t0);
+
+      float4 main(PSInput input) : SV_TARGET {
+        int3 offset = int3((input.uv * 64.0) % 4, 0);
+        float val = g_tex.Load(offset);
+        return float4(ddx_fine(val), ddy_fine(val), ddx_coarse(val), ddy_coarse(val));
+      }
+      ]]>
+    </Shader>
+  </ShaderOp>
+
   <ShaderOp Name="WriteFloat4" CS="CS" DispatchX="8" DispatchY="8">
     <RootSignature>RootFlags(0), UAV(u0)</RootSignature>
 

+ 88 - 60
tools/clang/unittests/HLSL/ExecutionTest.cpp

@@ -202,6 +202,7 @@ public:
   TEST_METHOD(Int64Test);
   TEST_METHOD(WaveIntrinsicsTest);
   TEST_METHOD(WaveIntrinsicsInPSTest);
+  TEST_METHOD(PartialDerivTest);
 
   // TAEF data-driven tests.
   BEGIN_TEST_METHOD(UnaryFloatOpTest)
@@ -1842,18 +1843,6 @@ struct SPrimitives {
   float f_float2_o;
 };
 
-static float g_SinCosFloats[] = {
-  -(INFINITY),
-  -1.0f,
-  -(FLT_MIN/2),
-  -0.0f,
-  0.0f,
-  FLT_MIN / 2,
-  1.0f,
-  INFINITY,
-  NAN
-};
-
 std::shared_ptr<ShaderOpTestResult>
 RunShaderOpTest(ID3D12Device *pDevice, dxc::DxcDllSupport &support,
                 IStream *pStream, LPCSTR pName,
@@ -2003,6 +1992,83 @@ TEST_F(ExecutionTest, BasicTriangleOpTest) {
   ReportLiveObjects();
 }
 
+// Rendering two right triangles forming a square and assigning a texture value
+// for each pixel to calculate derivates.
+TEST_F(ExecutionTest, PartialDerivTest) {
+  WEX::TestExecution::SetVerifyOutput verifySettings(WEX::TestExecution::VerifyOutputSettings::LogOnlyFailures);
+  CComPtr<IStream> pStream;
+  ReadHlslDataIntoNewStream(L"ShaderOpArith.xml", &pStream);
+
+  CComPtr<ID3D12Device> pDevice;
+  if (!CreateDevice(&pDevice))
+      return;
+
+  std::shared_ptr<ShaderOpTestResult> test = RunShaderOpTest(pDevice, m_support, pStream, "DerivFine", nullptr);
+  MappedData data;
+  D3D12_RESOURCE_DESC &D = test->ShaderOp->GetResourceByName("RTarget")->Desc;
+  UINT width = (UINT64)D.Width;
+  UINT height = (UINT64)D.Height;
+  UINT pixelSize = GetByteSizeForFormat(D.Format) / 4;
+
+  test->Test->GetReadBackData("RTarget", &data);
+  const float *pPixels = (float *)data.data();
+
+  UINT centerIndex = (UINT64)width * height / 2 - width / 2;
+
+  // pixel at the center
+  UINT offsetCenter = centerIndex * pixelSize;
+  float CenterDDXFine = pPixels[offsetCenter];
+  float CenterDDYFine = pPixels[offsetCenter + 1];
+  float CenterDDXCoarse = pPixels[offsetCenter + 2];
+  float CenterDDYCoarse = pPixels[offsetCenter + 3];
+
+  LogCommentFmt(
+      L"center  ddx_fine: %8f, ddy_fine: %8f, ddx_coarse: %8f, ddy_coarse: %8f",
+      CenterDDXFine, CenterDDYFine, CenterDDXCoarse, CenterDDYCoarse);
+
+  // The texture for the 9 pixels in the center should look like the following
+
+  // 256   32  64
+  // 2048 256 512
+  // 1   .125 .25
+
+  // In D3D12 there is no guarantee of how the adapter is grouping 2x2 pixels
+  // So for fine derivatives there can be up to two possible results for the center pixel,
+  // while for coarse derivatives there can be up to six possible results.
+  int ulpTolerance = 1;
+  // 512 - 256 or 2048 - 256
+  bool left = CompareFloatULP(CenterDDXFine, -1792.0f, ulpTolerance);
+  VERIFY_IS_TRUE(left || CompareFloatULP(CenterDDXFine, 256.0f, ulpTolerance));
+  // 256 - 32 or 256 - .125
+  bool top = CompareFloatULP(CenterDDYFine, 224.0f, ulpTolerance);
+  VERIFY_IS_TRUE(top || CompareFloatULP(CenterDDYFine, -255.875, ulpTolerance));
+
+  if (top && left) {
+    VERIFY_IS_TRUE((CompareFloatULP(CenterDDXCoarse, -224.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDXCoarse, -1792.0f, ulpTolerance)) &&
+                   (CompareFloatULP(CenterDDYCoarse, 224.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDYCoarse, 1792.0f, ulpTolerance)));
+  }
+  else if (top) { // top right quad
+    VERIFY_IS_TRUE((CompareFloatULP(CenterDDXCoarse, 256.0f, ulpTolerance)  ||
+                   CompareFloatULP(CenterDDXCoarse, 32.0f, ulpTolerance))   &&
+                   (CompareFloatULP(CenterDDYCoarse, 224.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDYCoarse, 448.0f, ulpTolerance)));
+  }
+  else if (left) { // bottom left quad
+    VERIFY_IS_TRUE((CompareFloatULP(CenterDDXCoarse, -1792.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDXCoarse, -.875f, ulpTolerance))   &&
+                   (CompareFloatULP(CenterDDYCoarse, -2047.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDYCoarse, -255.875f, ulpTolerance)));
+  }
+  else { // bottom right
+    VERIFY_IS_TRUE((CompareFloatULP(CenterDDXCoarse, 256.0f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDXCoarse, .125f, ulpTolerance))  &&
+                   (CompareFloatULP(CenterDDYCoarse, -255.875f, ulpTolerance) ||
+                   CompareFloatULP(CenterDDYCoarse, -511.75f, ulpTolerance)));
+  }
+}
+
 // Resource structure for data-driven tests.
 
 struct SUnaryFPOp {
@@ -2063,57 +2129,19 @@ struct STertiaryUintOp {
 };
 
 // representation for HLSL float vectors
-
-struct float2 {
-    float x;
-    float y;
-};
-
-struct float3 {
-    float x;
-    float y;
-    float z;
-};
-
-struct float4 {
-    float x;
-    float y;
-    float z;
-    float w;
-};
-
 struct SDotOp {
-    float4 input1;
-    float4 input2;
+    XMFLOAT4 input1;
+    XMFLOAT4 input2;
     float o_dot2;
     float o_dot3;
     float o_dot4;
 };
 
-// HLSL representation for unsigned int vectors 
-struct uint2 {
-    unsigned int x;
-    unsigned int y;
-};
-
-struct uint3 {
-    unsigned int x;
-    unsigned int y;
-    unsigned int z;
-};
-
-struct uint4 {
-    unsigned int x;
-    unsigned int y;
-    unsigned int z;
-    unsigned int w;
-};
-
 struct SMsad4 {
     unsigned int ref;
-    uint2 src;
-    uint4 accum;
-    uint4 result;
+    XMUINT2 src;
+    XMUINT4 accum;
+    XMUINT4 result;
 };
 
 // Parameter representation for taef data-driven tests
@@ -2274,7 +2302,7 @@ static TableParameter DotOpParameters[] = {
     { L"Validation.NumInput", TableParameter::UINT, true }
 };
 
-static TableParameter Msad4OpParameters[] = { 
+static TableParameter Msad4OpParameters[] = {
     { L"ShaderOp.Text", TableParameter::STRING, true },
     { L"Validation.Tolerance", TableParameter::DOUBLE, true },
     { L"Validation.NumInput", TableParameter::UINT, true },
@@ -3255,7 +3283,7 @@ TEST_F(ExecutionTest, DotTest) {
         SDotOp *pPrimitives = (SDotOp*)Data.data();
         for (size_t i = 0; i < count; ++i) {
             SDotOp *p = &pPrimitives[i];
-            float4 val1,val2;
+            XMFLOAT4 val1,val2;
             VERIFY_SUCCEEDED(ParseDataToVectorFloat((*Validation_Input1)[i],
                                                     (float *)&val1, 4));
             VERIFY_SUCCEEDED(ParseDataToVectorFloat((*Validation_Input2)[i],
@@ -3332,8 +3360,8 @@ TEST_F(ExecutionTest, Msad4Test) {
         SMsad4 *pPrimitives = (SMsad4*)Data.data();
         for (size_t i = 0; i < count; ++i) {
             SMsad4 *p = &pPrimitives[i];
-            uint2 src;
-            uint4 accum;
+            XMUINT2 src;
+            XMUINT4 accum;
             VERIFY_SUCCEEDED(ParseDataToVectorUint((*Validation_Source)[i], (unsigned int*)&src, 2));
             VERIFY_SUCCEEDED(ParseDataToVectorUint((*Validation_Accum)[i], (unsigned int*)&accum, 4));
             p->ref = (*Validation_Reference)[i];
@@ -3351,7 +3379,7 @@ TEST_F(ExecutionTest, Msad4Test) {
     WEX::TestExecution::DisableVerifyExceptions dve;
     for (size_t i = 0; i < count; ++i) {
         SMsad4 *p = &pPrimitives[i];
-        uint4 result;
+        XMUINT4 result;
         VERIFY_SUCCEEDED(ParseDataToVectorUint((*Validation_Expected)[i],
                                                (unsigned int *)&result, 4));
         LogCommentFmt(

+ 78 - 0
tools/clang/unittests/HLSL/HlslTestUtils.h

@@ -13,6 +13,7 @@
 #include <sstream>
 #include <fstream>
 #include "dxc/Support/Unicode.h"
+#include <dxgiformat.h>
 
 // If TAEF verify macros are available, use them to alias other legacy
 // comparison macros that don't have a direct translation.
@@ -205,6 +206,83 @@ inline bool CompareFloatRelativeEpsilon(const float &fsrc, const float &fref, in
     return CompareFloatULP(fsrc, fref, 23 - nRelativeExp);
 }
 
+// returns the number of bytes per pixel for a given dxgi format
+// add more cases if different format needed to copy back resources
+inline UINT GetByteSizeForFormat(DXGI_FORMAT value) {
+    switch (value) {
+    case DXGI_FORMAT_R32G32B32A32_TYPELESS: return 16;
+    case DXGI_FORMAT_R32G32B32A32_FLOAT: return 16;
+    case DXGI_FORMAT_R32G32B32A32_UINT: return 16;
+    case DXGI_FORMAT_R32G32B32A32_SINT: return 16;
+    case DXGI_FORMAT_R32G32B32_TYPELESS: return 12;
+    case DXGI_FORMAT_R32G32B32_FLOAT: return 12;
+    case DXGI_FORMAT_R32G32B32_UINT: return 12;
+    case DXGI_FORMAT_R32G32B32_SINT: return 12;
+    case DXGI_FORMAT_R16G16B16A16_TYPELESS: return 8;
+    case DXGI_FORMAT_R16G16B16A16_FLOAT: return 8;
+    case DXGI_FORMAT_R16G16B16A16_UNORM: return 8;
+    case DXGI_FORMAT_R16G16B16A16_UINT: return 8;
+    case DXGI_FORMAT_R16G16B16A16_SNORM: return 8;
+    case DXGI_FORMAT_R16G16B16A16_SINT: return 8;
+    case DXGI_FORMAT_R32G32_TYPELESS: return 8;
+    case DXGI_FORMAT_R32G32_FLOAT: return 8;
+    case DXGI_FORMAT_R32G32_UINT: return 8;
+    case DXGI_FORMAT_R32G32_SINT: return 8;
+    case DXGI_FORMAT_R32G8X24_TYPELESS: return 8;
+    case DXGI_FORMAT_D32_FLOAT_S8X24_UINT: return 4;
+    case DXGI_FORMAT_R32_FLOAT_X8X24_TYPELESS: return 4;
+    case DXGI_FORMAT_X32_TYPELESS_G8X24_UINT: return 4;
+    case DXGI_FORMAT_R10G10B10A2_TYPELESS: return 4;
+    case DXGI_FORMAT_R10G10B10A2_UNORM: return 4;
+    case DXGI_FORMAT_R10G10B10A2_UINT: return 4;
+    case DXGI_FORMAT_R11G11B10_FLOAT: return 4;
+    case DXGI_FORMAT_R8G8B8A8_TYPELESS: return 4;
+    case DXGI_FORMAT_R8G8B8A8_UNORM: return 4;
+    case DXGI_FORMAT_R8G8B8A8_UNORM_SRGB: return 4;
+    case DXGI_FORMAT_R8G8B8A8_UINT: return 4;
+    case DXGI_FORMAT_R8G8B8A8_SNORM: return 4;
+    case DXGI_FORMAT_R8G8B8A8_SINT: return 4;
+    case DXGI_FORMAT_R16G16_TYPELESS: return 4;
+    case DXGI_FORMAT_R16G16_FLOAT: return 4;
+    case DXGI_FORMAT_R16G16_UNORM: return 4;
+    case DXGI_FORMAT_R16G16_UINT: return 4;
+    case DXGI_FORMAT_R16G16_SNORM: return 4;
+    case DXGI_FORMAT_R16G16_SINT: return 4;
+    case DXGI_FORMAT_R32_TYPELESS: return 4;
+    case DXGI_FORMAT_D32_FLOAT: return 4;
+    case DXGI_FORMAT_R32_FLOAT: return 4;
+    case DXGI_FORMAT_R32_UINT: return 4;
+    case DXGI_FORMAT_R32_SINT: return 4;
+    case DXGI_FORMAT_R24G8_TYPELESS: return 4;
+    case DXGI_FORMAT_D24_UNORM_S8_UINT: return 4;
+    case DXGI_FORMAT_R24_UNORM_X8_TYPELESS: return 4;
+    case DXGI_FORMAT_X24_TYPELESS_G8_UINT: return 4;
+    case DXGI_FORMAT_R8G8_TYPELESS: return 2;
+    case DXGI_FORMAT_R8G8_UNORM: return 2;
+    case DXGI_FORMAT_R8G8_UINT: return 2;
+    case DXGI_FORMAT_R8G8_SNORM: return 2;
+    case DXGI_FORMAT_R8G8_SINT: return 2;
+    case DXGI_FORMAT_R16_TYPELESS: return 2;
+    case DXGI_FORMAT_R16_FLOAT: return 2;
+    case DXGI_FORMAT_D16_UNORM: return 2;
+    case DXGI_FORMAT_R16_UNORM: return 2;
+    case DXGI_FORMAT_R16_UINT: return 2;
+    case DXGI_FORMAT_R16_SNORM: return 2;
+    case DXGI_FORMAT_R16_SINT: return 2;
+    case DXGI_FORMAT_R8_TYPELESS: return 1;
+    case DXGI_FORMAT_R8_UNORM: return 1;
+    case DXGI_FORMAT_R8_UINT: return 1;
+    case DXGI_FORMAT_R8_SNORM: return 1;
+    case DXGI_FORMAT_R8_SINT: return 1;
+    case DXGI_FORMAT_A8_UNORM: return 1;
+    case DXGI_FORMAT_R1_UNORM: return 1;
+    default:
+        VERIFY_FAILED(E_INVALIDARG);
+        return 0;
+    }
+}
+
+
 #define SIMPLE_IUNKNOWN_IMPL1(_IFACE_) \
   private: volatile ULONG m_dwRef; \
   public:\

+ 89 - 84
tools/clang/unittests/HLSL/ShaderOpTest.cpp

@@ -103,80 +103,6 @@ bool UseHardwareDevice(const DXGI_ADAPTER_DESC1 &desc, LPCWSTR AdapterName) {
                                    desc.Description, wcslen(desc.Description));
 }
 
-UINT GetByteSizeForFormat(DXGI_FORMAT value) {
-  switch (value) {
-  case DXGI_FORMAT_R32G32B32A32_TYPELESS: return 16;
-  case DXGI_FORMAT_R32G32B32A32_FLOAT: return 16;
-  case DXGI_FORMAT_R32G32B32A32_UINT: return 16;
-  case DXGI_FORMAT_R32G32B32A32_SINT: return 16;
-  case DXGI_FORMAT_R32G32B32_TYPELESS: return 12;
-  case DXGI_FORMAT_R32G32B32_FLOAT: return 12;
-  case DXGI_FORMAT_R32G32B32_UINT: return 12;
-  case DXGI_FORMAT_R32G32B32_SINT: return 12;
-  case DXGI_FORMAT_R16G16B16A16_TYPELESS: return 8;
-  case DXGI_FORMAT_R16G16B16A16_FLOAT: return 8;
-  case DXGI_FORMAT_R16G16B16A16_UNORM: return 8;
-  case DXGI_FORMAT_R16G16B16A16_UINT: return 8;
-  case DXGI_FORMAT_R16G16B16A16_SNORM: return 8;
-  case DXGI_FORMAT_R16G16B16A16_SINT: return 8;
-  case DXGI_FORMAT_R32G32_TYPELESS: return 8;
-  case DXGI_FORMAT_R32G32_FLOAT: return 8;
-  case DXGI_FORMAT_R32G32_UINT: return 8;
-  case DXGI_FORMAT_R32G32_SINT: return 8;
-  case DXGI_FORMAT_R32G8X24_TYPELESS: return 8;
-  case DXGI_FORMAT_D32_FLOAT_S8X24_UINT: return 4;
-  case DXGI_FORMAT_R32_FLOAT_X8X24_TYPELESS: return 4;
-  case DXGI_FORMAT_X32_TYPELESS_G8X24_UINT: return 4;
-  case DXGI_FORMAT_R10G10B10A2_TYPELESS: return 4;
-  case DXGI_FORMAT_R10G10B10A2_UNORM: return 4;
-  case DXGI_FORMAT_R10G10B10A2_UINT: return 4;
-  case DXGI_FORMAT_R11G11B10_FLOAT: return 4;
-  case DXGI_FORMAT_R8G8B8A8_TYPELESS: return 4;
-  case DXGI_FORMAT_R8G8B8A8_UNORM: return 4;
-  case DXGI_FORMAT_R8G8B8A8_UNORM_SRGB: return 4;
-  case DXGI_FORMAT_R8G8B8A8_UINT: return 4;
-  case DXGI_FORMAT_R8G8B8A8_SNORM: return 4;
-  case DXGI_FORMAT_R8G8B8A8_SINT: return 4;
-  case DXGI_FORMAT_R16G16_TYPELESS: return 4;
-  case DXGI_FORMAT_R16G16_FLOAT: return 4;
-  case DXGI_FORMAT_R16G16_UNORM: return 4;
-  case DXGI_FORMAT_R16G16_UINT: return 4;
-  case DXGI_FORMAT_R16G16_SNORM: return 4;
-  case DXGI_FORMAT_R16G16_SINT: return 4;
-  case DXGI_FORMAT_R32_TYPELESS: return 4;
-  case DXGI_FORMAT_D32_FLOAT: return 4;
-  case DXGI_FORMAT_R32_FLOAT: return 4;
-  case DXGI_FORMAT_R32_UINT: return 4;
-  case DXGI_FORMAT_R32_SINT: return 4;
-  case DXGI_FORMAT_R24G8_TYPELESS: return 4;
-  case DXGI_FORMAT_D24_UNORM_S8_UINT: return 4;
-  case DXGI_FORMAT_R24_UNORM_X8_TYPELESS: return 4;
-  case DXGI_FORMAT_X24_TYPELESS_G8_UINT: return 4;
-  case DXGI_FORMAT_R8G8_TYPELESS: return 2;
-  case DXGI_FORMAT_R8G8_UNORM: return 2;
-  case DXGI_FORMAT_R8G8_UINT: return 2;
-  case DXGI_FORMAT_R8G8_SNORM: return 2;
-  case DXGI_FORMAT_R8G8_SINT: return 2;
-  case DXGI_FORMAT_R16_TYPELESS: return 2;
-  case DXGI_FORMAT_R16_FLOAT: return 2;
-  case DXGI_FORMAT_D16_UNORM: return 2;
-  case DXGI_FORMAT_R16_UNORM: return 2;
-  case DXGI_FORMAT_R16_UINT: return 2;
-  case DXGI_FORMAT_R16_SNORM: return 2;
-  case DXGI_FORMAT_R16_SINT: return 2;
-  case DXGI_FORMAT_R8_TYPELESS: return 1;
-  case DXGI_FORMAT_R8_UNORM: return 1;
-  case DXGI_FORMAT_R8_UINT: return 1;
-  case DXGI_FORMAT_R8_SNORM: return 1;
-  case DXGI_FORMAT_R8_SINT: return 1;
-  case DXGI_FORMAT_A8_UNORM: return 1;
-  case DXGI_FORMAT_R1_UNORM: return 1;
-  default:
-    CHECK_HR(E_INVALIDARG);
-    return 0;
-  }
-}
-
 void GetHardwareAdapter(IDXGIFactory2 *pFactory, LPCWSTR AdapterName,
                                IDXGIAdapter1 **ppAdapter) {
   CComPtr<IDXGIAdapter1> adapter;
@@ -331,12 +257,12 @@ void ShaderOpTest::CopyBackResources() {
       pList->CopyResource(D.ReadBack, D.Resource);
     }
     else {
-      UINT rowPitch = Desc.Width * 4;
+      UINT rowPitch = Desc.Width * GetByteSizeForFormat(Desc.Format);
       if (rowPitch % D3D12_TEXTURE_DATA_PITCH_ALIGNMENT)
         rowPitch += D3D12_TEXTURE_DATA_PITCH_ALIGNMENT - (rowPitch % D3D12_TEXTURE_DATA_PITCH_ALIGNMENT);
       D3D12_PLACED_SUBRESOURCE_FOOTPRINT Footprint;
       Footprint.Offset = 0;
-      Footprint.Footprint = CD3DX12_SUBRESOURCE_FOOTPRINT(DXGI_FORMAT_R8G8B8A8_UNORM, Desc.Width, Desc.Height, 1, rowPitch);
+      Footprint.Footprint = CD3DX12_SUBRESOURCE_FOOTPRINT(Desc.Format, Desc.Width, Desc.Height, 1, rowPitch);
       CD3DX12_TEXTURE_COPY_LOCATION DstLoc(D.ReadBack, Footprint);
       CD3DX12_TEXTURE_COPY_LOCATION SrcLoc(D.Resource, 0);
       pList->CopyTextureRegion(&DstLoc, 0, 0, 0, &SrcLoc, nullptr);
@@ -491,7 +417,7 @@ void ShaderOpTest::CreatePipelineState() {
     InitByteCode(&GDesc.PS, pPS);
     GDesc.InputLayout.NumElements = m_pShaderOp->InputElements.size();
     GDesc.InputLayout.pInputElementDescs = m_pShaderOp->InputElements.data();
-    GDesc.PrimitiveTopologyType = m_pShaderOp->PrimitiveTopology;
+    GDesc.PrimitiveTopologyType = m_pShaderOp->PrimitiveTopologyType;
     GDesc.NumRenderTargets = m_pShaderOp->RenderTargets.size();
     GDesc.SampleMask = m_pShaderOp->SampleMask;
     for (size_t i = 0; i < m_pShaderOp->RenderTargets.size(); ++i) {
@@ -581,11 +507,15 @@ void ShaderOpTest::CreateResources() {
       CComPtr<ID3D12Resource> pIntermediate;
       CD3DX12_HEAP_PROPERTIES upload(D3D12_HEAP_TYPE_UPLOAD);
       D3D12_RESOURCE_DESC uploadDesc = R.Desc;
+
+      // Calculate size required for intermediate buffer
+      UINT64 totalBytes;
+      m_pDevice->GetCopyableFootprints(&uploadDesc, 0, 1, 0, nullptr, nullptr, nullptr, &totalBytes);
+
       if (!isBuffer) {
         // Assuming a simple linear layout here.
         uploadDesc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
-        uploadDesc.Width *= uploadDesc.Height;
-        uploadDesc.Width *= GetByteSizeForFormat(uploadDesc.Format);
+        uploadDesc.Width = totalBytes;
         uploadDesc.Height = 1;
         uploadDesc.MipLevels = 1;
         uploadDesc.Format = DXGI_FORMAT_UNKNOWN;
@@ -607,8 +537,8 @@ void ShaderOpTest::CreateResources() {
 
       D3D12_SUBRESOURCE_DATA transferData;
       transferData.pData = values.data();
-      transferData.RowPitch = values.size();
-      transferData.SlicePitch = transferData.RowPitch;
+      transferData.RowPitch = values.size() / R.Desc.Height;
+      transferData.SlicePitch = values.size();
       UpdateSubresources<1>(pList, pResource.p, pIntermediate.p, 0, 0, 1,
                             &transferData);
     }
@@ -822,11 +752,19 @@ void ShaderOpTest::RunCommandList() {
 
     const float ClearColor[4] = { 0.0f, 0.2f, 0.4f, 1.0f };
     pList->ClearRenderTargetView(rtvHandles[0], ClearColor, 0, nullptr);
-    pList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);// TODO: set from m_pShaderOp
 
     // TODO: set all of this from m_pShaderOp.
     ShaderOpResourceData &VBufferData = this->m_ResourceData[m_pShaderOp->Strings.insert("VBuffer")];
 
+    D3D_PRIMITIVE_TOPOLOGY topology = D3D_PRIMITIVE_TOPOLOGY_UNDEFINED;
+    for (ShaderOpResource &resource : m_pShaderOp->Resources) {
+        if (_strcmpi(resource.Name, "VBuffer") == 0) {
+            topology = resource.PrimitiveTopology;
+            break;
+        }
+    }
+    pList->IASetPrimitiveTopology(topology);
+
     // Calculate the stride in bytes from the inputs, assuming linear & contiguous.
     UINT strideInBytes = 0;
     for (auto && IE : m_pShaderOp->InputElements) {
@@ -1068,7 +1006,9 @@ enum class ParserEnumKind {
   RESOURCE_STATE,
   DESCRIPTOR_HEAP_TYPE,
   DESCRIPTOR_HEAP_FLAG,
-  UAV_DIMENSION
+  UAV_DIMENSION,
+  PRIMITIVE_TOPOLOGY,
+  PRIMITIVE_TOPOLOGY_TYPE
 };
 
 struct ParserEnumValue {
@@ -1311,6 +1251,58 @@ static const ParserEnumValue UAV_DIMENSION_TABLE[] = {
   { L"TEXTURE3D", D3D12_UAV_DIMENSION_TEXTURE3D }
 };
 
+static const ParserEnumValue PRIMITIVE_TOPOLOGY_TABLE[] = {
+    { L"UNDEFINED",D3D_PRIMITIVE_TOPOLOGY_UNDEFINED },
+    { L"POINTLIST",D3D_PRIMITIVE_TOPOLOGY_POINTLIST },
+    { L"LINELIST",D3D_PRIMITIVE_TOPOLOGY_LINELIST },
+    { L"LINESTRIP",D3D_PRIMITIVE_TOPOLOGY_LINESTRIP },
+    { L"TRIANGLELIST",D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST },
+    { L"TRIANGLESTRIP",D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP },
+    { L"LINELIST_ADJ",D3D_PRIMITIVE_TOPOLOGY_LINELIST_ADJ },
+    { L"LINESTRIP_ADJ",D3D_PRIMITIVE_TOPOLOGY_LINESTRIP_ADJ },
+    { L"TRIANGLELIST_ADJ",D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST_ADJ },
+    { L"TRIANGLESTRIP_ADJ",D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP_ADJ },
+    { L"1_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_1_CONTROL_POINT_PATCHLIST },
+    { L"2_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_2_CONTROL_POINT_PATCHLIST },
+    { L"3_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_3_CONTROL_POINT_PATCHLIST },
+    { L"4_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_4_CONTROL_POINT_PATCHLIST },
+    { L"5_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_5_CONTROL_POINT_PATCHLIST },
+    { L"6_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_6_CONTROL_POINT_PATCHLIST },
+    { L"7_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_7_CONTROL_POINT_PATCHLIST },
+    { L"8_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_8_CONTROL_POINT_PATCHLIST },
+    { L"9_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_9_CONTROL_POINT_PATCHLIST },
+    { L"10_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_10_CONTROL_POINT_PATCHLIST },
+    { L"11_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_11_CONTROL_POINT_PATCHLIST },
+    { L"12_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_12_CONTROL_POINT_PATCHLIST },
+    { L"13_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_13_CONTROL_POINT_PATCHLIST },
+    { L"14_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_14_CONTROL_POINT_PATCHLIST },
+    { L"15_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_15_CONTROL_POINT_PATCHLIST },
+    { L"16_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_16_CONTROL_POINT_PATCHLIST },
+    { L"17_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_17_CONTROL_POINT_PATCHLIST },
+    { L"18_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_18_CONTROL_POINT_PATCHLIST },
+    { L"19_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_19_CONTROL_POINT_PATCHLIST },
+    { L"20_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_20_CONTROL_POINT_PATCHLIST },
+    { L"21_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_21_CONTROL_POINT_PATCHLIST },
+    { L"22_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_22_CONTROL_POINT_PATCHLIST },
+    { L"23_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_23_CONTROL_POINT_PATCHLIST },
+    { L"24_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_24_CONTROL_POINT_PATCHLIST },
+    { L"25_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_25_CONTROL_POINT_PATCHLIST },
+    { L"26_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_26_CONTROL_POINT_PATCHLIST },
+    { L"27_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_27_CONTROL_POINT_PATCHLIST },
+    { L"28_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_28_CONTROL_POINT_PATCHLIST },
+    { L"29_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_29_CONTROL_POINT_PATCHLIST },
+    { L"30_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_30_CONTROL_POINT_PATCHLIST },
+    { L"31_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_31_CONTROL_POINT_PATCHLIST },
+    { L"32_CONTROL_POINT_PATCHLIST",D3D_PRIMITIVE_TOPOLOGY_32_CONTROL_POINT_PATCHLIST }
+};
+
+static const ParserEnumValue PRIMITIVE_TOPOLOGY_TYPE_TABLE[] = {
+    { L"UNDEFINED", D3D12_PRIMITIVE_TOPOLOGY_TYPE_UNDEFINED },
+    { L"POINT", D3D12_PRIMITIVE_TOPOLOGY_TYPE_POINT },
+    { L"LINE", D3D12_PRIMITIVE_TOPOLOGY_TYPE_LINE },
+    { L"TRIANGLE", D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE },
+    { L"PATCH", D3D12_PRIMITIVE_TOPOLOGY_TYPE_PATCH }
+};
 
 static const ParserEnumTable g_ParserEnumTables[] = {
   { _countof(INPUT_CLASSIFICATION_TABLE), INPUT_CLASSIFICATION_TABLE, ParserEnumKind::INPUT_CLASSIFICATION },
@@ -1325,7 +1317,9 @@ static const ParserEnumTable g_ParserEnumTables[] = {
   { _countof(RESOURCE_STATE_TABLE), RESOURCE_STATE_TABLE, ParserEnumKind::RESOURCE_STATE },
   { _countof(DESCRIPTOR_HEAP_TYPE_TABLE), DESCRIPTOR_HEAP_TYPE_TABLE, ParserEnumKind::DESCRIPTOR_HEAP_TYPE },
   { _countof(DESCRIPTOR_HEAP_FLAG_TABLE), DESCRIPTOR_HEAP_FLAG_TABLE, ParserEnumKind::DESCRIPTOR_HEAP_FLAG },
-  { _countof(UAV_DIMENSION_TABLE), UAV_DIMENSION_TABLE, ParserEnumKind::UAV_DIMENSION }
+  { _countof(UAV_DIMENSION_TABLE), UAV_DIMENSION_TABLE, ParserEnumKind::UAV_DIMENSION },
+  { _countof(PRIMITIVE_TOPOLOGY_TABLE), PRIMITIVE_TOPOLOGY_TABLE, ParserEnumKind::PRIMITIVE_TOPOLOGY },
+  { _countof(PRIMITIVE_TOPOLOGY_TYPE_TABLE), PRIMITIVE_TOPOLOGY_TYPE_TABLE, ParserEnumKind::PRIMITIVE_TOPOLOGY_TYPE },
 };
 
 static HRESULT GetEnumValue(LPCWSTR name, ParserEnumKind K, UINT *pValue) {
@@ -1419,6 +1413,14 @@ static HRESULT ReadAttrUAV_DIMENSION(IXmlReader *pReader, LPCWSTR pAttrName, D3D
   return ReadAttrEnumT(pReader, pAttrName, ParserEnumKind::UAV_DIMENSION, pValue, D3D12_UAV_DIMENSION_BUFFER);
 }
 
+static HRESULT ReadAttrPRIMITIVE_TOPOLOGY(IXmlReader *pReader, LPCWSTR pAttrName, D3D_PRIMITIVE_TOPOLOGY *pValue) {
+  return ReadAttrEnumT(pReader, pAttrName, ParserEnumKind::PRIMITIVE_TOPOLOGY, pValue, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
+}
+
+static HRESULT ReadAttrPRIMITIVE_TOPOLOGY_TYPE(IXmlReader *pReader, LPCWSTR pAttrName, D3D12_PRIMITIVE_TOPOLOGY_TYPE *pValue) {
+  return ReadAttrEnumT(pReader, pAttrName, ParserEnumKind::PRIMITIVE_TOPOLOGY_TYPE, pValue, D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE);
+}
+
 HRESULT ShaderOpParser::ReadAttrStr(IXmlReader *pReader, LPCWSTR pAttrName, LPCSTR *ppValue) {
   if (S_FALSE == CHECK_HR_RET(pReader->MoveToAttributeByName(pAttrName, nullptr))) {
     *ppValue = nullptr;
@@ -1749,6 +1751,7 @@ void ShaderOpParser::ParseShaderOp(IXmlReader *pReader, ShaderOp *pShaderOp) {
   CHECK_HR(ReadAttrUINT(pReader, L"DispatchX", &pShaderOp->DispatchX, 1));
   CHECK_HR(ReadAttrUINT(pReader, L"DispatchY", &pShaderOp->DispatchY, 1));
   CHECK_HR(ReadAttrUINT(pReader, L"DispatchZ", &pShaderOp->DispatchZ, 1));
+  CHECK_HR(ReadAttrPRIMITIVE_TOPOLOGY_TYPE(pReader, L"TopologyType", &pShaderOp->PrimitiveTopologyType));
   UINT startDepth;
   CHECK_HR(pReader->GetDepth(&startDepth));
   XmlNodeType nt = XmlNodeType_Element;
@@ -1841,6 +1844,8 @@ void ShaderOpParser::ParseResource(IXmlReader *pReader, ShaderOpResource *pResou
   CHECK_HR(ReadAttrRESOURCE_STATES(pReader, L"InitialResourceState", &pResource->InitialResourceState));
   CHECK_HR(ReadAttrRESOURCE_STATES(pReader, L"TransitionTo", &pResource->TransitionTo));
 
+  CHECK_HR(ReadAttrPRIMITIVE_TOPOLOGY(pReader, L"Topology", &pResource->PrimitiveTopology));
+
   // Set some fixed values.
   if (pResource->Desc.Dimension == D3D12_RESOURCE_DIMENSION_BUFFER) {
     pResource->Desc.Height = 1;

+ 3 - 1
tools/clang/unittests/HLSL/ShaderOpTest.h

@@ -142,6 +142,7 @@ public:
   D3D12_RESOURCE_STATES TransitionTo;           // State to transition before running shader.
   BOOL                  ReadBack;               // TRUE to read back to CPU after operations are done.
   std::vector<BYTE>     InitBytes;              // Byte payload for initialization.
+  D3D_PRIMITIVE_TOPOLOGY PrimitiveTopology;     // Primitive topology.
 };
 
 // Use this class to represent a shader.
@@ -177,7 +178,8 @@ public:
   LPCWSTR AdapterName = nullptr;
   LPCSTR CS = nullptr, VS = nullptr, PS = nullptr;
   UINT DispatchX = 1, DispatchY = 1, DispatchZ = 1;
-  D3D12_PRIMITIVE_TOPOLOGY_TYPE PrimitiveTopology = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; // TODO: parse from file
+  D3D12_PRIMITIVE_TOPOLOGY_TYPE PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE;
+
   UINT SampleMask = UINT_MAX; // TODO: parse from file
   DXGI_FORMAT RTVFormats[8]; // TODO: parse from file
   bool IsCompute() const {

+ 35 - 1
utils/hct/hctdb_inst_docs.txt

@@ -70,6 +70,41 @@ The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
 
 Counts the number of bits in the input integer.
 
+* Inst: DerivCoarseX - computes the rate of change per stamp in x direction.
+
+dst = DerivCoarseX(src);
+
+Computes the rate of change per stamp in x direction. Only a single x derivative pair is computed for each 2x2 stamp of pixels.
+The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad: 
+As an example, the x derivative could be a delta from the top row of pixels.
+The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+
+* Inst: DerivCoarseY - computes the rate of change per stamp in y direction.
+
+dst = DerivCoarseY(src);
+
+Computes the rate of change per stamp in y direction. Only a single y derivative pair is computed for each 2x2 stamp of pixels.
+The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad: 
+As an example, the y derivative could be a delta from the left column of pixels.
+The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+* Inst: DerivFineX - computes the rate of change per pixel in x direction.
+
+dst = DerivFineX(src);
+
+Computes the rate of change per pixel in x direction. Each pixel in the 2x2 stamp gets a unique pair of x derivative calculations
+The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
+There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
+* Inst: DerivFineY - computes the rate of change per pixel in y direction.
+
+dst = DerivFineY(src);
+
+Computes the rate of change per pixel in y direction. Each pixel in the 2x2 stamp gets a unique pair of y derivative calculations
+The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
+There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
+
 * Inst: Dot2 - Two-dimensional vector dot-product
 
 Two-dimensional vector dot-product
@@ -549,4 +584,3 @@ Either of destHI or destLO may be specified as NULL instead of specifying a regi
 * Inst: USubb - unsigned subtract of 32-bit operands with the borrow
 
 dest0, dest1 = USubb(src0, src1)
-