Browse Source

Allow clip/cull elements to be declared as array [2] (#2109)

* Allow clip/cull elements to be declared as array [2]

- This approach fixes validation and packing to handle this case.
- There could be implications to runtime ViewID validation
- fix some issues found in packing related to rowsUsed result from Pack
  functions.  Make these return 0 on failure, instead of startRow.
- Split PackNext into FindNext and PackNext that uses it for greater
  flexibility.
Tex Riddell 6 years ago
parent
commit
faacd80b74

+ 35 - 34
docs/DXIL.rst

@@ -651,6 +651,7 @@ ID Name       Description
 6  Target     Special handling for SV_Target
 6  Target     Special handling for SV_Target
 7  TessFactor Special handling for tessellation factors
 7  TessFactor Special handling for tessellation factors
 8  Shadow     Shadow element must be added to a signature for compatibility
 8  Shadow     Shadow element must be added to a signature for compatibility
+8  ClipCull   Special packing rules for SV_ClipDistance or SV_CullDistance
 == ========== =============================================================
 == ========== =============================================================
 
 
 .. SEMINT-RST:END
 .. SEMINT-RST:END
@@ -662,40 +663,40 @@ Semantic Interpretations for each SemanticKind at each SigPointKind are as follo
 .. <py::lines('SEMINT-TABLE-RST')>hctdb_instrhelp.get_sem_interpretation_table_rst()</py>
 .. <py::lines('SEMINT-TABLE-RST')>hctdb_instrhelp.get_sem_interpretation_table_rst()</py>
 .. SEMINT-TABLE-RST:BEGIN
 .. SEMINT-TABLE-RST:BEGIN
 
 
-====================== ============ ====== ============ ============ ====== ======= ========== ============ ====== ====== ====== ============ ====== ============= ============= ========
-Semantic               VSIn         VSOut  PCIn         HSIn         HSCPIn HSCPOut PCOut      DSIn         DSCPIn DSOut  GSVIn  GSIn         GSOut  PSIn          PSOut         CSIn
-====================== ============ ====== ============ ============ ====== ======= ========== ============ ====== ====== ====== ============ ====== ============= ============= ========
-Arbitrary              Arb          Arb    NA           NA           Arb    Arb     Arb        Arb          Arb    Arb    Arb    NA           Arb    Arb           NA            NA
-VertexID               SV           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NA
-InstanceID             SV           Arb    NA           NA           Arb    Arb     NA         NA           Arb    Arb    Arb    NA           Arb    Arb           NA            NA
-Position               Arb          SV     NA           NA           SV     SV      Arb        Arb          SV     SV     SV     NA           SV     SV            NA            NA
-RenderTargetArrayIndex Arb          SV     NA           NA           SV     SV      Arb        Arb          SV     SV     SV     NA           SV     SV            NA            NA
-ViewPortArrayIndex     Arb          SV     NA           NA           SV     SV      Arb        Arb          SV     SV     SV     NA           SV     SV            NA            NA
-ClipDistance           Arb          SV     NA           NA           SV     SV      Arb        Arb          SV     SV     SV     NA           SV     SV            NA            NA
-CullDistance           Arb          SV     NA           NA           SV     SV      Arb        Arb          SV     SV     SV     NA           SV     SV            NA            NA
-OutputControlPointID   NA           NA     NA           NotInSig     NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NA
-DomainLocation         NA           NA     NA           NA           NA     NA      NA         NotInSig     NA     NA     NA     NA           NA     NA            NA            NA
-PrimitiveID            NA           NA     NotInSig     NotInSig     NA     NA      NA         NotInSig     NA     NA     NA     Shadow       SGV    SGV           NA            NA
-GSInstanceID           NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NotInSig     NA     NA            NA            NA
-SampleIndex            NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     Shadow _41    NA            NA
-IsFrontFace            NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           SGV    SGV           NA            NA
-Coverage               NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NotInSig _50  NotPacked _41 NA
-InnerCoverage          NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NotInSig _50  NA            NA
-Target                 NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            Target        NA
-Depth                  NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NotPacked     NA
-DepthLessEqual         NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NotPacked _50 NA
-DepthGreaterEqual      NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NotPacked _50 NA
-StencilRef             NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NotPacked _50 NA
-DispatchThreadID       NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NotInSig
-GroupID                NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NotInSig
-GroupIndex             NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NotInSig
-GroupThreadID          NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NA            NA            NotInSig
-TessFactor             NA           NA     NA           NA           NA     NA      TessFactor TessFactor   NA     NA     NA     NA           NA     NA            NA            NA
-InsideTessFactor       NA           NA     NA           NA           NA     NA      TessFactor TessFactor   NA     NA     NA     NA           NA     NA            NA            NA
-ViewID                 NotInSig _61 NA     NotInSig _61 NotInSig _61 NA     NA      NA         NotInSig _61 NA     NA     NA     NotInSig _61 NA     NotInSig _61  NA            NA
-Barycentrics           NA           NA     NA           NA           NA     NA      NA         NA           NA     NA     NA     NA           NA     NotPacked _61 NA            NA
-ShadingRate            NA           SV _64 NA           NA           SV _64 SV _64  NA         NA           SV _64 SV _64 SV _64 NA           SV _64 SV _64        NA            NA
-====================== ============ ====== ============ ============ ====== ======= ========== ============ ====== ====== ====== ============ ====== ============= ============= ========
+====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ========
+Semantic               VSIn         VSOut    PCIn         HSIn         HSCPIn   HSCPOut  PCOut      DSIn         DSCPIn   DSOut    GSVIn    GSIn         GSOut    PSIn          PSOut         CSIn
+====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ========
+Arbitrary              Arb          Arb      NA           NA           Arb      Arb      Arb        Arb          Arb      Arb      Arb      NA           Arb      Arb           NA            NA
+VertexID               SV           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NA
+InstanceID             SV           Arb      NA           NA           Arb      Arb      NA         NA           Arb      Arb      Arb      NA           Arb      Arb           NA            NA
+Position               Arb          SV       NA           NA           SV       SV       Arb        Arb          SV       SV       SV       NA           SV       SV            NA            NA
+RenderTargetArrayIndex Arb          SV       NA           NA           SV       SV       Arb        Arb          SV       SV       SV       NA           SV       SV            NA            NA
+ViewPortArrayIndex     Arb          SV       NA           NA           SV       SV       Arb        Arb          SV       SV       SV       NA           SV       SV            NA            NA
+ClipDistance           Arb          ClipCull NA           NA           ClipCull ClipCull Arb        Arb          ClipCull ClipCull ClipCull NA           ClipCull ClipCull      NA            NA
+CullDistance           Arb          ClipCull NA           NA           ClipCull ClipCull Arb        Arb          ClipCull ClipCull ClipCull NA           ClipCull ClipCull      NA            NA
+OutputControlPointID   NA           NA       NA           NotInSig     NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NA
+DomainLocation         NA           NA       NA           NA           NA       NA       NA         NotInSig     NA       NA       NA       NA           NA       NA            NA            NA
+PrimitiveID            NA           NA       NotInSig     NotInSig     NA       NA       NA         NotInSig     NA       NA       NA       Shadow       SGV      SGV           NA            NA
+GSInstanceID           NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NotInSig     NA       NA            NA            NA
+SampleIndex            NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       Shadow _41    NA            NA
+IsFrontFace            NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           SGV      SGV           NA            NA
+Coverage               NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NotInSig _50  NotPacked _41 NA
+InnerCoverage          NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NotInSig _50  NA            NA
+Target                 NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            Target        NA
+Depth                  NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NotPacked     NA
+DepthLessEqual         NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NotPacked _50 NA
+DepthGreaterEqual      NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NotPacked _50 NA
+StencilRef             NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NotPacked _50 NA
+DispatchThreadID       NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NotInSig
+GroupID                NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NotInSig
+GroupIndex             NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NotInSig
+GroupThreadID          NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NA            NA            NotInSig
+TessFactor             NA           NA       NA           NA           NA       NA       TessFactor TessFactor   NA       NA       NA       NA           NA       NA            NA            NA
+InsideTessFactor       NA           NA       NA           NA           NA       NA       TessFactor TessFactor   NA       NA       NA       NA           NA       NA            NA            NA
+ViewID                 NotInSig _61 NA       NotInSig _61 NotInSig _61 NA       NA       NA         NotInSig _61 NA       NA       NA       NotInSig _61 NA       NotInSig _61  NA            NA
+Barycentrics           NA           NA       NA           NA           NA       NA       NA         NA           NA       NA       NA       NA           NA       NotPacked _61 NA            NA
+ShadingRate            NA           SV _64   NA           NA           SV _64   SV _64   NA         NA           SV _64   SV _64   SV _64   NA           SV _64   SV _64        NA            NA
+====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ========
 
 
 .. SEMINT-TABLE-RST:END
 .. SEMINT-TABLE-RST:END
 
 

+ 1 - 0
include/dxc/DXIL/DxilConstants.h

@@ -212,6 +212,7 @@ namespace DXIL {
     Target, // Special handling for SV_Target
     Target, // Special handling for SV_Target
     TessFactor, // Special handling for tessellation factors
     TessFactor, // Special handling for tessellation factors
     Shadow, // Shadow element must be added to a signature for compatibility
     Shadow, // Shadow element must be added to a signature for compatibility
+    ClipCull, // Special packing rules for SV_ClipDistance or SV_CullDistance
     Invalid,
     Invalid,
   };
   };
   // SemanticInterpretationKind-ENUM:END
   // SemanticInterpretationKind-ENUM:END

+ 31 - 31
include/dxc/DXIL/DxilSigPoint.inl

@@ -49,38 +49,38 @@ const SigPoint SigPoint::ms_SigPoints[kNumSigPointRecords] = {
 
 
 // <py::lines('INTERPRETATION-TABLE')>hctdb_instrhelp.get_interpretation_table()</py>
 // <py::lines('INTERPRETATION-TABLE')>hctdb_instrhelp.get_interpretation_table()</py>
 // INTERPRETATION-TABLE:BEGIN
 // INTERPRETATION-TABLE:BEGIN
-//   Semantic,               VSIn,         VSOut,  PCIn,         HSIn,         HSCPIn, HSCPOut, PCOut,      DSIn,         DSCPIn, DSOut,  GSVIn,  GSIn,         GSOut,  PSIn,          PSOut,         CSIn
+//   Semantic,               VSIn,         VSOut,    PCIn,         HSIn,         HSCPIn,   HSCPOut,  PCOut,      DSIn,         DSCPIn,   DSOut,    GSVIn,    GSIn,         GSOut,    PSIn,          PSOut,         CSIn
 #define DO_INTERPRETATION_TABLE(ROW) \
 #define DO_INTERPRETATION_TABLE(ROW) \
-  ROW(Arbitrary,              Arb,          Arb,    NA,           NA,           Arb,    Arb,     Arb,        Arb,          Arb,    Arb,    Arb,    NA,           Arb,    Arb,           NA,            NA) \
-  ROW(VertexID,               SV,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NA) \
-  ROW(InstanceID,             SV,           Arb,    NA,           NA,           Arb,    Arb,     NA,         NA,           Arb,    Arb,    Arb,    NA,           Arb,    Arb,           NA,            NA) \
-  ROW(Position,               Arb,          SV,     NA,           NA,           SV,     SV,      Arb,        Arb,          SV,     SV,     SV,     NA,           SV,     SV,            NA,            NA) \
-  ROW(RenderTargetArrayIndex, Arb,          SV,     NA,           NA,           SV,     SV,      Arb,        Arb,          SV,     SV,     SV,     NA,           SV,     SV,            NA,            NA) \
-  ROW(ViewPortArrayIndex,     Arb,          SV,     NA,           NA,           SV,     SV,      Arb,        Arb,          SV,     SV,     SV,     NA,           SV,     SV,            NA,            NA) \
-  ROW(ClipDistance,           Arb,          SV,     NA,           NA,           SV,     SV,      Arb,        Arb,          SV,     SV,     SV,     NA,           SV,     SV,            NA,            NA) \
-  ROW(CullDistance,           Arb,          SV,     NA,           NA,           SV,     SV,      Arb,        Arb,          SV,     SV,     SV,     NA,           SV,     SV,            NA,            NA) \
-  ROW(OutputControlPointID,   NA,           NA,     NA,           NotInSig,     NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NA) \
-  ROW(DomainLocation,         NA,           NA,     NA,           NA,           NA,     NA,      NA,         NotInSig,     NA,     NA,     NA,     NA,           NA,     NA,            NA,            NA) \
-  ROW(PrimitiveID,            NA,           NA,     NotInSig,     NotInSig,     NA,     NA,      NA,         NotInSig,     NA,     NA,     NA,     Shadow,       SGV,    SGV,           NA,            NA) \
-  ROW(GSInstanceID,           NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NotInSig,     NA,     NA,            NA,            NA) \
-  ROW(SampleIndex,            NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     Shadow _41,    NA,            NA) \
-  ROW(IsFrontFace,            NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           SGV,    SGV,           NA,            NA) \
-  ROW(Coverage,               NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NotInSig _50,  NotPacked _41, NA) \
-  ROW(InnerCoverage,          NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NotInSig _50,  NA,            NA) \
-  ROW(Target,                 NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            Target,        NA) \
-  ROW(Depth,                  NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NotPacked,     NA) \
-  ROW(DepthLessEqual,         NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NotPacked _50, NA) \
-  ROW(DepthGreaterEqual,      NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NotPacked _50, NA) \
-  ROW(StencilRef,             NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NotPacked _50, NA) \
-  ROW(DispatchThreadID,       NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NotInSig) \
-  ROW(GroupID,                NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NotInSig) \
-  ROW(GroupIndex,             NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NotInSig) \
-  ROW(GroupThreadID,          NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NA,            NA,            NotInSig) \
-  ROW(TessFactor,             NA,           NA,     NA,           NA,           NA,     NA,      TessFactor, TessFactor,   NA,     NA,     NA,     NA,           NA,     NA,            NA,            NA) \
-  ROW(InsideTessFactor,       NA,           NA,     NA,           NA,           NA,     NA,      TessFactor, TessFactor,   NA,     NA,     NA,     NA,           NA,     NA,            NA,            NA) \
-  ROW(ViewID,                 NotInSig _61, NA,     NotInSig _61, NotInSig _61, NA,     NA,      NA,         NotInSig _61, NA,     NA,     NA,     NotInSig _61, NA,     NotInSig _61,  NA,            NA) \
-  ROW(Barycentrics,           NA,           NA,     NA,           NA,           NA,     NA,      NA,         NA,           NA,     NA,     NA,     NA,           NA,     NotPacked _61, NA,            NA) \
-  ROW(ShadingRate,            NA,           SV _64, NA,           NA,           SV _64, SV _64,  NA,         NA,           SV _64, SV _64, SV _64, NA,           SV _64, SV _64,        NA,            NA)
+  ROW(Arbitrary,              Arb,          Arb,      NA,           NA,           Arb,      Arb,      Arb,        Arb,          Arb,      Arb,      Arb,      NA,           Arb,      Arb,           NA,            NA) \
+  ROW(VertexID,               SV,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NA) \
+  ROW(InstanceID,             SV,           Arb,      NA,           NA,           Arb,      Arb,      NA,         NA,           Arb,      Arb,      Arb,      NA,           Arb,      Arb,           NA,            NA) \
+  ROW(Position,               Arb,          SV,       NA,           NA,           SV,       SV,       Arb,        Arb,          SV,       SV,       SV,       NA,           SV,       SV,            NA,            NA) \
+  ROW(RenderTargetArrayIndex, Arb,          SV,       NA,           NA,           SV,       SV,       Arb,        Arb,          SV,       SV,       SV,       NA,           SV,       SV,            NA,            NA) \
+  ROW(ViewPortArrayIndex,     Arb,          SV,       NA,           NA,           SV,       SV,       Arb,        Arb,          SV,       SV,       SV,       NA,           SV,       SV,            NA,            NA) \
+  ROW(ClipDistance,           Arb,          ClipCull, NA,           NA,           ClipCull, ClipCull, Arb,        Arb,          ClipCull, ClipCull, ClipCull, NA,           ClipCull, ClipCull,      NA,            NA) \
+  ROW(CullDistance,           Arb,          ClipCull, NA,           NA,           ClipCull, ClipCull, Arb,        Arb,          ClipCull, ClipCull, ClipCull, NA,           ClipCull, ClipCull,      NA,            NA) \
+  ROW(OutputControlPointID,   NA,           NA,       NA,           NotInSig,     NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NA) \
+  ROW(DomainLocation,         NA,           NA,       NA,           NA,           NA,       NA,       NA,         NotInSig,     NA,       NA,       NA,       NA,           NA,       NA,            NA,            NA) \
+  ROW(PrimitiveID,            NA,           NA,       NotInSig,     NotInSig,     NA,       NA,       NA,         NotInSig,     NA,       NA,       NA,       Shadow,       SGV,      SGV,           NA,            NA) \
+  ROW(GSInstanceID,           NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NotInSig,     NA,       NA,            NA,            NA) \
+  ROW(SampleIndex,            NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       Shadow _41,    NA,            NA) \
+  ROW(IsFrontFace,            NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           SGV,      SGV,           NA,            NA) \
+  ROW(Coverage,               NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NotInSig _50,  NotPacked _41, NA) \
+  ROW(InnerCoverage,          NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NotInSig _50,  NA,            NA) \
+  ROW(Target,                 NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            Target,        NA) \
+  ROW(Depth,                  NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NotPacked,     NA) \
+  ROW(DepthLessEqual,         NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NotPacked _50, NA) \
+  ROW(DepthGreaterEqual,      NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NotPacked _50, NA) \
+  ROW(StencilRef,             NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NotPacked _50, NA) \
+  ROW(DispatchThreadID,       NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NotInSig) \
+  ROW(GroupID,                NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NotInSig) \
+  ROW(GroupIndex,             NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NotInSig) \
+  ROW(GroupThreadID,          NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NA,            NA,            NotInSig) \
+  ROW(TessFactor,             NA,           NA,       NA,           NA,           NA,       NA,       TessFactor, TessFactor,   NA,       NA,       NA,       NA,           NA,       NA,            NA,            NA) \
+  ROW(InsideTessFactor,       NA,           NA,       NA,           NA,           NA,       NA,       TessFactor, TessFactor,   NA,       NA,       NA,       NA,           NA,       NA,            NA,            NA) \
+  ROW(ViewID,                 NotInSig _61, NA,       NotInSig _61, NotInSig _61, NA,       NA,       NA,         NotInSig _61, NA,       NA,       NA,       NotInSig _61, NA,       NotInSig _61,  NA,            NA) \
+  ROW(Barycentrics,           NA,           NA,       NA,           NA,           NA,       NA,       NA,         NA,           NA,       NA,       NA,       NA,           NA,       NotPacked _61, NA,            NA) \
+  ROW(ShadingRate,            NA,           SV _64,   NA,           NA,           SV _64,   SV _64,   NA,         NA,           SV _64,   SV _64,   SV _64,   NA,           SV _64,   SV _64,        NA,            NA)
 // INTERPRETATION-TABLE:END
 // INTERPRETATION-TABLE:END
 
 
 const VersionedSemanticInterpretation SigPoint::ms_SemanticInterpretationTable[(unsigned)DXIL::SemanticKind::Invalid][(unsigned)SigPoint::Kind::Invalid] = {
 const VersionedSemanticInterpretation SigPoint::ms_SemanticInterpretationTable[(unsigned)DXIL::SemanticKind::Invalid][(unsigned)SigPoint::Kind::Invalid] = {

+ 5 - 0
include/dxc/HLSL/DxilSignatureAllocator.h

@@ -82,6 +82,7 @@ public:
   static const uint8_t kEFSGV = 1 << 2;
   static const uint8_t kEFSGV = 1 << 2;
   static const uint8_t kEFSV = 1 << 3;
   static const uint8_t kEFSV = 1 << 3;
   static const uint8_t kEFTessFactor = 1 << 4;
   static const uint8_t kEFTessFactor = 1 << 4;
+  static const uint8_t kEFClipCull = 1 << 5;
   static const uint8_t kEFConflictsWithIndexed = kEFSGV | kEFSV;
   static const uint8_t kEFConflictsWithIndexed = kEFSGV | kEFSV;
   static uint8_t GetElementFlags(const PackElement *SE);
   static uint8_t GetElementFlags(const PackElement *SE);
 
 
@@ -130,6 +131,10 @@ public:
   ConflictType DetectColConflict(const PackElement *SE, unsigned row, unsigned col);
   ConflictType DetectColConflict(const PackElement *SE, unsigned row, unsigned col);
   void PlaceElement(const PackElement *SE, unsigned row, unsigned col);
   void PlaceElement(const PackElement *SE, unsigned row, unsigned col);
 
 
+  // FindNext/PackNext return found/packed location + element rows if found,
+  // otherwise, they return 0.
+  unsigned FindNext(unsigned &foundRow, unsigned &foundCol,
+                    PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol = 0);
   unsigned PackNext(PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol = 0);
   unsigned PackNext(PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol = 0);
 
 
   // Simple greedy in-order packer used by PackOptimized
   // Simple greedy in-order packer used by PackOptimized

+ 202 - 103
include/dxc/HLSL/DxilSignatureAllocator.inl

@@ -36,6 +36,9 @@ uint8_t DxilSignatureAllocator::GetElementFlags(const PackElement *SE) {
     case DXIL::SemanticInterpretationKind::TessFactor:
     case DXIL::SemanticInterpretationKind::TessFactor:
       flags |= kEFTessFactor;
       flags |= kEFTessFactor;
       break;
       break;
+    case DXIL::SemanticInterpretationKind::ClipCull:
+      flags |= kEFClipCull;
+      break;
     default:
     default:
       DXASSERT(false, "otherwise, unexpected interpretation for allocated element");
       DXASSERT(false, "otherwise, unexpected interpretation for allocated element");
   }
   }
@@ -49,11 +52,13 @@ uint8_t DxilSignatureAllocator::GetElementFlags(const PackElement *SE) {
 uint8_t DxilSignatureAllocator::GetConflictFlagsLeft(uint8_t flags) {
 uint8_t DxilSignatureAllocator::GetConflictFlagsLeft(uint8_t flags) {
   uint8_t conflicts = 0;
   uint8_t conflicts = 0;
   if (flags & kEFArbitrary)
   if (flags & kEFArbitrary)
-    conflicts |= kEFSGV | kEFSV | kEFTessFactor;
+    conflicts |= kEFSGV | kEFSV | kEFTessFactor | kEFClipCull;
   if (flags & kEFSV)
   if (flags & kEFSV)
     conflicts |= kEFSGV;
     conflicts |= kEFSGV;
   if (flags & kEFTessFactor)
   if (flags & kEFTessFactor)
     conflicts |= kEFSGV;
     conflicts |= kEFSGV;
+  if (flags & kEFClipCull)
+    conflicts |= kEFSGV;
   return conflicts;
   return conflicts;
 }
 }
 
 
@@ -61,11 +66,13 @@ uint8_t DxilSignatureAllocator::GetConflictFlagsLeft(uint8_t flags) {
 uint8_t DxilSignatureAllocator::GetConflictFlagsRight(uint8_t flags) {
 uint8_t DxilSignatureAllocator::GetConflictFlagsRight(uint8_t flags) {
   uint8_t conflicts = 0;
   uint8_t conflicts = 0;
   if (flags & kEFSGV)
   if (flags & kEFSGV)
-    conflicts |= kEFArbitrary | kEFSV | kEFTessFactor;
+    conflicts |= kEFArbitrary | kEFSV | kEFTessFactor | kEFClipCull;
   if (flags & kEFSV)
   if (flags & kEFSV)
     conflicts |= kEFArbitrary;
     conflicts |= kEFArbitrary;
   if (flags & kEFTessFactor)
   if (flags & kEFTessFactor)
     conflicts |= kEFArbitrary;
     conflicts |= kEFArbitrary;
+  if (flags & kEFClipCull)
+    conflicts |= kEFArbitrary;
   return conflicts;
   return conflicts;
 }
 }
 
 
@@ -220,12 +227,13 @@ struct {
 
 
 } // anonymous namespace
 } // anonymous namespace
 
 
-unsigned DxilSignatureAllocator::PackNext(PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol) {
-  unsigned rowsUsed = startRow;
+unsigned DxilSignatureAllocator::FindNext(
+    unsigned &foundRow, unsigned &foundCol,
+    PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol) {
 
 
   unsigned rows = SE->GetRows();
   unsigned rows = SE->GetRows();
   if (rows > numRows)
   if (rows > numRows)
-    return rowsUsed; // element will not fit
+    return 0; // element will not fit
 
 
   unsigned cols = SE->GetCols();
   unsigned cols = SE->GetCols();
   DXASSERT_NOMSG(startCol + cols <= 4);
   DXASSERT_NOMSG(startCol + cols <= 4);
@@ -236,18 +244,27 @@ unsigned DxilSignatureAllocator::PackNext(PackElement* SE, unsigned startRow, un
     for (unsigned col = startCol; col <= 4 - cols; ++col) {
     for (unsigned col = startCol; col <= 4 - cols; ++col) {
       if (DetectColConflict(SE, row, col))
       if (DetectColConflict(SE, row, col))
         continue;
         continue;
-      PlaceElement(SE, row, col);
-      SE->SetLocation(row, col);
+      foundRow = row;
+      foundCol = col;
       return row + rows;
       return row + rows;
     }
     }
   }
   }
+  return 0;
+}
 
 
+unsigned DxilSignatureAllocator::PackNext(PackElement* SE, unsigned startRow, unsigned numRows, unsigned startCol) {
+  unsigned row, col;
+  unsigned rowsUsed = FindNext(row, col, SE, startRow, numRows, startCol);
+  if (rowsUsed) {
+    PlaceElement(SE, row, col);
+    SE->SetLocation(row, col);
+  }
   return rowsUsed;
   return rowsUsed;
 }
 }
 
 
 unsigned DxilSignatureAllocator::PackGreedy(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows, unsigned startCol) {
 unsigned DxilSignatureAllocator::PackGreedy(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows, unsigned startCol) {
   // Allocation failures should be caught by IsFullyAllocated()
   // Allocation failures should be caught by IsFullyAllocated()
-  unsigned rowsUsed = startRow;
+  unsigned rowsUsed = 0;
 
 
   for (auto &SE : elements) {
   for (auto &SE : elements) {
     rowsUsed = std::max(rowsUsed, PackNext(SE, startRow, numRows, startCol));
     rowsUsed = std::max(rowsUsed, PackNext(SE, startRow, numRows, startCol));
@@ -256,8 +273,11 @@ unsigned DxilSignatureAllocator::PackGreedy(std::vector<PackElement*> elements,
   return rowsUsed;
   return rowsUsed;
 }
 }
 
 
+static_assert(DXIL::kMaxClipOrCullDistanceElementCount == 2,
+              "code here assumes this is 2");
+
 unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows) {
 unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows) {
-  unsigned rowsUsed = startRow;
+  unsigned rowsUsed = 0;
 
 
   // Clip/Cull needs special handling due to limitations unique to these.
   // Clip/Cull needs special handling due to limitations unique to these.
   //  Otherwise, packer could easily pack across too many registers in available gaps.
   //  Otherwise, packer could easily pack across too many registers in available gaps.
@@ -266,29 +286,33 @@ unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> element
   //  - both have a maximum of 8 components shared between them
   //  - both have a maximum of 8 components shared between them
   //  - you can have a combined maximum of two registers declared with clip or cull SV's
   //  - you can have a combined maximum of two registers declared with clip or cull SV's
   // other SV rules still apply:
   // other SV rules still apply:
-  //  - no indexing allowed
+  //  - X no indexing allowed X - This rule has been changed to allow indexing
   //  - cannot come before arbitrary values in same register
   //  - cannot come before arbitrary values in same register
-  // Strategy for dealing with these:
+  // Strategy for dealing with these with rows == 1 for all elements:
   //  - attempt to pack these into a two register allocator
   //  - attempt to pack these into a two register allocator
   //    - if this fails, some constraint is blocking, or declaration order is preventing good packing
   //    - if this fails, some constraint is blocking, or declaration order is preventing good packing
   //      for example: 2, 1, 2, 3 - total 8 components and packable, but if greedily packed, it will fail
   //      for example: 2, 1, 2, 3 - total 8 components and packable, but if greedily packed, it will fail
   //      Packing largest to smallest would solve this.
   //      Packing largest to smallest would solve this.
   //  - track components used for each register and create temp elements for allocation tests
   //  - track components used for each register and create temp elements for allocation tests
+  //  - iterate rows and look for a viable location for each temp element
+  //    When found, allocate original sub-elements associated with temp element.
+  // If one or more clip/cull elements have rows > 1:
+  //  - walk through each pair of adjacent rows, initializing a temp two-row allocator
+  //    with existing contents and trying to pack all elements into the remaining space.
+  //  - when successful, do real allocation into these rows.
 
 
   // Packing overview
   // Packing overview
   //  - pack 4-component elements first
   //  - pack 4-component elements first
   //  - pack indexed tessfactors to the right
   //  - pack indexed tessfactors to the right
   //  - pack arbitrary elements
   //  - pack arbitrary elements
+  //  - pack system value elements
   //  - pack clip/cull
   //  - pack clip/cull
-  //    - iterate rows and look for a viable location for each temp element
-  //      When found, allocate original sub-elements associated with temp element.
-  //  - next, pack system value elements
-  //  - finally, pack SGV elements
+  //  - pack SGV elements
 
 
   // ==========
   // ==========
   // Group elements
   // Group elements
   std::vector<PackElement*>  clipcullElements,
   std::vector<PackElement*>  clipcullElements,
-                                      clipcullElementsByRow[2],
+                                      clipcullElementsByRow[DXIL::kMaxClipOrCullDistanceElementCount],
                                       vec4Elements,
                                       vec4Elements,
                                       arbElements,
                                       arbElements,
                                       svElements,
                                       svElements,
@@ -308,15 +332,14 @@ unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> element
         else
         else
           arbElements.push_back(SE);
           arbElements.push_back(SE);
         break;
         break;
+      case DXIL::SemanticInterpretationKind::ClipCull:
+        clipcullElements.push_back(SE);
+        break;
       case DXIL::SemanticInterpretationKind::SV:
       case DXIL::SemanticInterpretationKind::SV:
-        if (SE->GetKind() == DXIL::SemanticKind::ClipDistance || SE->GetKind() == DXIL::SemanticKind::CullDistance)
-          clipcullElements.push_back(SE);
-        else {
-          if (SE->GetCols() == 4)
-            vec4Elements.push_back(SE);
-          else
-            svElements.push_back(SE);
-        }
+        if (SE->GetCols() == 4)
+          vec4Elements.push_back(SE);
+        else
+          svElements.push_back(SE);
         break;
         break;
       case DXIL::SemanticInterpretationKind::SGV:
       case DXIL::SemanticInterpretationKind::SGV:
         sgvElements.push_back(SE);
         sgvElements.push_back(SE);
@@ -332,96 +355,130 @@ unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> element
     }
     }
   }
   }
 
 
-  // ==========
-  // Preallocate clip/cull elements
-  std::sort(clipcullElements.begin(), clipcullElements.end(), CmpElementsLess);
-  DxilSignatureAllocator clipcullAllocator(2, m_bUseMinPrecision);
-  unsigned clipcullRegUsed = clipcullAllocator.PackGreedy(clipcullElements, 0, 2);
-  unsigned clipcullComponentsByRow[2] = {0, 0};
-  for (auto &SE : clipcullElements) {
-    if (!SE->IsAllocated()) {
-      continue;
-    }
-    unsigned row = SE->GetStartRow();
-    DXASSERT_NOMSG(row < clipcullRegUsed);
-    clipcullElementsByRow[row].push_back(SE);
-    clipcullComponentsByRow[row] += SE->GetCols();
-    // Deallocate element, to be allocated later:
-    SE->ClearLocation();
-  }
-  // Init temp elements, used to find compatible spaces for subsets:
-  DummyElement clipcullTempElements[2];
-  for (unsigned row = 0; row < clipcullRegUsed; ++row) {
-    DXASSERT_NOMSG(!clipcullElementsByRow[row].empty());
-    clipcullTempElements[row].kind = clipcullElementsByRow[row][0]->GetKind();
-    clipcullTempElements[row].interpolation = clipcullElementsByRow[row][0]->GetInterpolationMode();
-    clipcullTempElements[row].interpretation = clipcullElementsByRow[row][0]->GetInterpretation();
-    clipcullTempElements[row].dataBitWidth = clipcullElementsByRow[row][0]->GetDataBitWidth();
-    clipcullTempElements[row].rows = 1;
-    clipcullTempElements[row].cols = clipcullComponentsByRow[row];
-  }
-
   // ==========
   // ==========
   // Allocate 4-component elements
   // Allocate 4-component elements
   if (!vec4Elements.empty()) {
   if (!vec4Elements.empty()) {
     std::sort(vec4Elements.begin(), vec4Elements.end(), CmpElementsLess);
     std::sort(vec4Elements.begin(), vec4Elements.end(), CmpElementsLess);
-    unsigned used = PackGreedy(vec4Elements, startRow, numRows);
-    startRow += used;
-    numRows -= used;
-    if (rowsUsed < used)
-      rowsUsed = used;
+    rowsUsed = std::max(rowsUsed, PackGreedy(vec4Elements, startRow, numRows));
+    startRow = std::max(startRow, rowsUsed);
   }
   }
 
 
   // ==========
   // ==========
   // Allocate indexed tessfactors in rightmost column
   // Allocate indexed tessfactors in rightmost column
   if (!indexedtessElements.empty()) {
   if (!indexedtessElements.empty()) {
     std::sort(indexedtessElements.begin(), indexedtessElements.end(), CmpElementsLess);
     std::sort(indexedtessElements.begin(), indexedtessElements.end(), CmpElementsLess);
-    unsigned used = PackGreedy(indexedtessElements, startRow, numRows, 3);
-    if (rowsUsed < used)
-      rowsUsed = used;
+    rowsUsed = std::max(rowsUsed, PackGreedy(indexedtessElements, startRow, numRows, 3));
   }
   }
 
 
   // ==========
   // ==========
   // Allocate arbitrary
   // Allocate arbitrary
   if (!arbElements.empty()) {
   if (!arbElements.empty()) {
     std::sort(arbElements.begin(), arbElements.end(), CmpElementsLess);
     std::sort(arbElements.begin(), arbElements.end(), CmpElementsLess);
-    unsigned used = PackGreedy(arbElements, startRow, numRows);
-    if (rowsUsed < used)
-      rowsUsed = used;
+    rowsUsed = std::max(rowsUsed, PackGreedy(arbElements, startRow, numRows));
   }
   }
 
 
   // ==========
   // ==========
   // Allocate system values
   // Allocate system values
   if (!svElements.empty()) {
   if (!svElements.empty()) {
     std::sort(svElements.begin(), svElements.end(), CmpElementsLess);
     std::sort(svElements.begin(), svElements.end(), CmpElementsLess);
-    unsigned used = PackGreedy(svElements, startRow, numRows);
-    if (rowsUsed < used)
-      rowsUsed = used;
+    rowsUsed = std::max(rowsUsed, PackGreedy(svElements, startRow, numRows));
   }
   }
 
 
   // ==========
   // ==========
   // Allocate clip/cull
   // Allocate clip/cull
-  for (unsigned i = 0; i < clipcullRegUsed; ++i) {
-    bool bAllocated = false;
-    unsigned cols = clipcullComponentsByRow[i];
-    for (unsigned row = startRow; row < startRow + numRows; ++row) {
-      if (DetectRowConflict(&clipcullTempElements[i], row))
+  std::sort(clipcullElements.begin(), clipcullElements.end(), CmpElementsLess);
+  unsigned numClipCullComponents = 0;
+  unsigned clipCullMultiRowCols = 0;
+  for (auto &SE : clipcullElements) {
+    numClipCullComponents += SE->GetRows() * SE->GetCols();
+    if (SE->GetRows() > 1) {
+      clipCullMultiRowCols += SE->GetCols();
+    }
+  }
+  if (0 == clipCullMultiRowCols) {
+    // Preallocate clip/cull elements into two rows and allocate independently
+    DxilSignatureAllocator clipcullAllocator(DXIL::kMaxClipOrCullDistanceElementCount, m_bUseMinPrecision);
+    unsigned clipcullRegUsed = clipcullAllocator.PackGreedy(clipcullElements, 0, DXIL::kMaxClipOrCullDistanceElementCount);
+    unsigned clipcullComponentsByRow[DXIL::kMaxClipOrCullDistanceElementCount] = {0, 0};
+    for (auto &SE : clipcullElements) {
+      if (!SE->IsAllocated()) {
         continue;
         continue;
-      for (unsigned col = 0; col <= 4 - cols; ++col) {
-        if (DetectColConflict(&clipcullTempElements[i], row, col))
+      }
+      unsigned row = SE->GetStartRow();
+      DXASSERT_NOMSG(row < clipcullRegUsed);
+      clipcullElementsByRow[row].push_back(SE);
+      clipcullComponentsByRow[row] += SE->GetCols();
+      // Deallocate element, to be allocated later:
+      SE->ClearLocation();
+    }
+
+    // Allocate rows independently
+    // Init temp elements, used to find compatible spaces for subsets:
+    DummyElement clipcullTempElements[DXIL::kMaxClipOrCullDistanceElementCount];
+    for (unsigned row = 0; row < clipcullRegUsed; ++row) {
+      DXASSERT_NOMSG(!clipcullElementsByRow[row].empty());
+      clipcullTempElements[row].kind = clipcullElementsByRow[row][0]->GetKind();
+      clipcullTempElements[row].interpolation = clipcullElementsByRow[row][0]->GetInterpolationMode();
+      clipcullTempElements[row].interpretation = clipcullElementsByRow[row][0]->GetInterpretation();
+      clipcullTempElements[row].dataBitWidth = clipcullElementsByRow[row][0]->GetDataBitWidth();
+      clipcullTempElements[row].rows = 1;
+      clipcullTempElements[row].cols = clipcullComponentsByRow[row];
+    }
+    for (unsigned i = 0; i < clipcullRegUsed; ++i) {
+      bool bAllocated = false;
+      unsigned cols = clipcullComponentsByRow[i];
+      for (unsigned row = startRow; row < startRow + numRows; ++row) {
+        if (DetectRowConflict(&clipcullTempElements[i], row))
           continue;
           continue;
-        for (auto &SE : clipcullElementsByRow[i]) {
-          PlaceElement(SE, row, col);
-          SE->SetLocation(row, col);
-          col += SE->GetCols();
+        for (unsigned col = 0; col <= 4 - cols; ++col) {
+          if (DetectColConflict(&clipcullTempElements[i], row, col))
+            continue;
+          for (auto &SE : clipcullElementsByRow[i]) {
+            PlaceElement(SE, row, col);
+            SE->SetLocation(row, col);
+            col += SE->GetCols();
+          }
+          bAllocated = true;
+          if (rowsUsed < row + 1)
+            rowsUsed = row + 1;
+          break;
         }
         }
-        bAllocated = true;
-        if (rowsUsed < row + 1)
-          rowsUsed = row + 1;
-        break;
+        if (bAllocated)
+          break;
+      }
+    }
+  } else if (numRows > 1) {
+    // Multi-row clip/cull element found, test allocation at each pair of
+    // rows.  If location found, allocate the elements.
+    for (unsigned i = 0; i < numRows - 1; ++i) {
+      unsigned row = startRow + i;
+      // Use temp allocator with copy of rows to test locations
+      DxilSignatureAllocator clipcullAllocator(DXIL::kMaxClipOrCullDistanceElementCount, m_bUseMinPrecision);
+      clipcullAllocator.m_Registers[0] = m_Registers[row];
+      clipcullAllocator.m_Registers[1] = m_Registers[row + 1];
+      clipcullAllocator.PackGreedy(clipcullElements, 0, DXIL::kMaxClipOrCullDistanceElementCount, 0);
+      bool bFullyAllocated = true;
+      for (auto &SE : clipcullElements) {
+        bFullyAllocated &= SE->IsAllocated();
+        if (!bFullyAllocated)
+          break;
       }
       }
-      if (bAllocated)
+      // Clear temp allocations
+      for (auto &SE : clipcullElements)
+        SE->ClearLocation();
+      if (bFullyAllocated) {
+        // Found a spot, do real allocation
+        PackGreedy(clipcullElements, row, DXIL::kMaxClipOrCullDistanceElementCount);
+#ifdef DBG
+        for (auto &SE : clipcullElements) {
+          bFullyAllocated &= SE->IsAllocated();
+          if (!bFullyAllocated)
+            break;
+        }
+        DXASSERT(bFullyAllocated, "otherwise, clip/cull allocation failed when predicted to succeed.");
+#endif
         break;
         break;
+      }
     }
     }
   }
   }
 
 
@@ -429,22 +486,23 @@ unsigned DxilSignatureAllocator::PackOptimized(std::vector<PackElement*> element
   // Allocate system generated values
   // Allocate system generated values
   if (!sgvElements.empty()) {
   if (!sgvElements.empty()) {
     std::sort(sgvElements.begin(), sgvElements.end(), CmpElementsLess);
     std::sort(sgvElements.begin(), sgvElements.end(), CmpElementsLess);
-    unsigned used = PackGreedy(sgvElements, startRow, numRows);
-    if (rowsUsed < used)
-      rowsUsed = used;
+    rowsUsed = std::max(rowsUsed, PackGreedy(sgvElements, startRow, numRows));
   }
   }
 
 
   return rowsUsed;
   return rowsUsed;
 }
 }
 
 
 unsigned DxilSignatureAllocator::PackPrefixStable(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows) {
 unsigned DxilSignatureAllocator::PackPrefixStable(std::vector<PackElement*> elements, unsigned startRow, unsigned numRows) {
-  unsigned rowsUsed = startRow;
+  unsigned rowsUsed = 0;
 
 
   // Special handling for prefix-stable clip/cull arguments
   // Special handling for prefix-stable clip/cull arguments
   // - basically, do not pack with anything else to maximize chance to pack into two register limit
   // - basically, do not pack with anything else to maximize chance to pack into two register limit
+  // - this is complicated by multi-row clip/cull elements, which force allocation adjacency,
+  //   but PrefixStable does not know in advance if this will be the case.
   unsigned clipcullRegUsed = 0;
   unsigned clipcullRegUsed = 0;
-  DxilSignatureAllocator clipcullAllocator(2, m_bUseMinPrecision);
-  DummyElement clipcullTempElements[2];
+  bool clipcullIndexed = false;
+  DxilSignatureAllocator clipcullAllocator(DXIL::kMaxClipOrCullDistanceElementCount, m_bUseMinPrecision);
+  DummyElement clipcullTempElements[DXIL::kMaxClipOrCullDistanceElementCount];
 
 
   for (auto &SE : elements) {
   for (auto &SE : elements) {
     // Clear any existing allocation
     // Clear any existing allocation
@@ -457,23 +515,64 @@ unsigned DxilSignatureAllocator::PackPrefixStable(std::vector<PackElement*> elem
       case DXIL::SemanticInterpretationKind::SGV:
       case DXIL::SemanticInterpretationKind::SGV:
         break;
         break;
       case DXIL::SemanticInterpretationKind::SV:
       case DXIL::SemanticInterpretationKind::SV:
-        if (SE->GetKind() == DXIL::SemanticKind::ClipDistance || SE->GetKind() == DXIL::SemanticKind::CullDistance) {
-          unsigned used = clipcullAllocator.PackNext(SE, 0, 2);
-          if (used) {
-            if (used > clipcullRegUsed) {
+        break;
+      case DXIL::SemanticInterpretationKind::ClipCull:
+        {
+          unsigned row, col;
+          unsigned used = clipcullAllocator.FindNext(row, col, SE, 0, DXIL::kMaxClipOrCullDistanceElementCount);
+          if (!used)
+            continue;
+          if (SE->GetRows() > 1 && !clipcullIndexed) {
+            // If two rows already allocated, they must be adjacent to
+            // pack indexed elements.
+            if (clipcullRegUsed == DXIL::kMaxClipOrCullDistanceElementCount &&
+                clipcullTempElements[0].row + 1 != clipcullTempElements[1].row)
+              continue;
+            clipcullIndexed = true;
+          }
+          // If necessary, allocate placeholder element to reserve space in signature:
+          if (used > clipcullRegUsed) {
+            auto &DE = clipcullTempElements[clipcullRegUsed];
+            DE.kind = SE->GetKind();
+            DE.interpolation = SE->GetInterpolationMode();
+            DE.interpretation = SE->GetInterpretation();
+            DE.dataBitWidth = SE->GetDataBitWidth();
+            DE.rows = 1;
+            DE.cols = 4;
+            if (clipcullIndexed) {
+              // Either allocate one 2-row placeholder element, or allocate on adjacent row.
+              if (clipcullRegUsed < 1) {
+                // Allocate one element with 2 rows
+                DE.rows = DXIL::kMaxClipOrCullDistanceElementCount;
+                rowsUsed = std::max(rowsUsed, PackNext(&DE, startRow, numRows));
+                if (!DE.IsAllocated())
+                  continue;
+                // Init second placeholder element to next row because it's used to
+                // adjust element locations starting on that row.
+                clipcullTempElements[1] = DE;
+                clipcullTempElements[1].row = DE.row + 1;
+                clipcullTempElements[1].rows = 1;
+              } else {
+                DXASSERT_NOMSG(clipcullRegUsed == 1);
+                // Make sure additional element can be placed just after other
+                // element, otherwise fail to allocate this element.
+                rowsUsed = std::max(rowsUsed, PackNext(&DE,
+                  clipcullTempElements[0].row + 1, clipcullTempElements[0].row + 2));
+                if (!DE.IsAllocated())
+                  continue;
+              }
+              clipcullRegUsed = DXIL::kMaxClipOrCullDistanceElementCount;
+            } else {
+              // allocate placeholder element, reserving new row(s)
+              rowsUsed = std::max(rowsUsed, PackNext(&DE, startRow, numRows));
+              if (!DE.IsAllocated())
+                continue;
               clipcullRegUsed = used;
               clipcullRegUsed = used;
-              // allocate placeholder element, reserving new row
-              clipcullTempElements[used - 1].kind = SE->GetKind();
-              clipcullTempElements[used - 1].interpolation = SE->GetInterpolationMode();
-              clipcullTempElements[used - 1].interpretation = SE->GetInterpretation();
-              clipcullTempElements[used - 1].dataBitWidth = SE->GetDataBitWidth();
-              clipcullTempElements[used - 1].rows = 1;
-              clipcullTempElements[used - 1].cols = 4;
-              rowsUsed = std::max(rowsUsed, PackNext(&clipcullTempElements[used - 1], startRow, numRows));
             }
             }
-            // Actually place element in correct row:
-            SE->SetLocation(clipcullTempElements[used - 1].GetStartRow(), SE->GetStartCol());
           }
           }
+          // Place element in temp allocator and adjust row for signature
+          clipcullAllocator.PlaceElement(SE, row, col);
+          SE->SetLocation(clipcullTempElements[row].GetStartRow(), col);
           continue;
           continue;
         }
         }
         break;
         break;

+ 4 - 2
lib/HLSL/DxilValidation.cpp

@@ -4125,7 +4125,8 @@ static void ValidateSignatureElement(DxilSignatureElement &SE,
     }
     }
     // Maximum rows is 1 for system values other than Target
     // Maximum rows is 1 for system values other than Target
     // with the exception of tessfactors, which are validated in CheckPatchConstantSemantic
     // with the exception of tessfactors, which are validated in CheckPatchConstantSemantic
-    if (!bIsTessfactor && SE.GetRows() > 1) {
+    // and ClipDistance/CullDistance, which have other custom constraints.
+    if (!bIsTessfactor && !bIsClipCull && SE.GetRows() > 1) {
       ValCtx.EmitSignatureError(&SE, ValidationRule::MetaSystemValueRows);
       ValCtx.EmitSignatureError(&SE, ValidationRule::MetaSystemValueRows);
     }
     }
   }
   }
@@ -4307,7 +4308,8 @@ static void ValidateSignature(ValidationContext &ValCtx, const DxilSignature &S,
     case DXIL::SemanticKind::ClipDistance:
     case DXIL::SemanticKind::ClipDistance:
     case DXIL::SemanticKind::CullDistance:
     case DXIL::SemanticKind::CullDistance:
       // Validate max 8 components across 2 rows (registers)
       // Validate max 8 components across 2 rows (registers)
-      clipcullRowSet[streamId].insert(E->GetStartRow());
+      for (unsigned rowIdx = 0; rowIdx < E->GetRows(); rowIdx++)
+        clipcullRowSet[streamId].insert(E->GetStartRow() + rowIdx);
       if (clipcullRowSet[streamId].size() > 2) {
       if (clipcullRowSet[streamId].size() > 2) {
         ValCtx.EmitError(ValidationRule::MetaClipCullMaxRows);
         ValCtx.EmitError(ValidationRule::MetaClipCullMaxRows);
       }
       }

+ 1 - 0
lib/HLSL/HLSignatureLower.cpp

@@ -326,6 +326,7 @@ void HLSignatureLower::ProcessArgument(Function *func,
     case DXIL::SemanticInterpretationKind::Target:
     case DXIL::SemanticInterpretationKind::Target:
     case DXIL::SemanticInterpretationKind::TessFactor:
     case DXIL::SemanticInterpretationKind::TessFactor:
     case DXIL::SemanticInterpretationKind::NotPacked:
     case DXIL::SemanticInterpretationKind::NotPacked:
+    case DXIL::SemanticInterpretationKind::ClipCull:
       // Will be replaced with load/store intrinsics in
       // Will be replaced with load/store intrinsics in
       // GenerateDxilInputsOutputs
       // GenerateDxilInputsOutputs
       break;
       break;

+ 27 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull-opt.hlsl

@@ -0,0 +1,27 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_optimized %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_ClipDistance          1    yz         1  CLIPDST   float    yz
+// CHECK-NEXT: ; SV_CullDistance          0      w        1  CULLDST   float      w
+// CHECK-NEXT: ; BeforeClipCull           0   x           1     NONE   float   x
+// CHECK-NEXT: ; SV_ClipDistance          0      w        2  CLIPDST   float      w
+// CHECK-NEXT: ; SV_CullDistance          1   xyz         2  CULLDST   float   xyz
+
+struct VS_OUT {
+  float3 first : First;
+  float clip0 : SV_ClipDistance0;
+  float3 cull1 : SV_CullDistance1;
+  float cull0 : SV_CullDistance0;
+  float2 clip1 : SV_ClipDistance1;
+  float withFirst : WithFirst;          // packs with First
+  float afterClipCull : BeforeClipCull; // packed before clip/cull in same row
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 26 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull-opt2.hlsl

@@ -0,0 +1,26 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_optimized %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_CullDistance          0      w        1  CULLDST   float      w
+// CHECK-NEXT: ; SV_ClipDistance          1    yz         1  CLIPDST   float    yz
+// CHECK-NEXT: ; BeforeClipCull           0   x           1     NONE   float   x
+// CHECK-NEXT: ; SV_ClipDistance          0   x           2  CLIPDST   float   x
+// CHECK-NEXT: ; SV_ClipDistance          2    yz         2  CLIPDST   float    yz
+
+struct VS_OUT {
+  float3 first : First;
+  float cull0 : SV_CullDistance0;
+  float clip0 : SV_ClipDistance0;
+  float2 clip1[2] : SV_ClipDistance1;
+  float withFirst : WithFirst;          // packs with First
+  float afterClipCull : BeforeClipCull; // packed before clip/cull in same row
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 31 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull-opt3.hlsl

@@ -0,0 +1,31 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_optimized %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_CullDistance          0      w        1  CULLDST   float      w
+// CHECK-NEXT: ; SV_CullDistance          1     z         1  CULLDST   float     z
+// CHECK-NEXT: ; SV_ClipDistance          1    y          1  CLIPDST   float    y
+// CHECK-NEXT: ; BeforeClipCull           0   x           1     NONE   float   x
+// CHECK-NEXT: ; SV_CullDistance          2     z         2  CULLDST   float     z
+// CHECK-NEXT: ; SV_ClipDistance          0   x           2  CLIPDST   float   x
+// CHECK-NEXT: ; SV_ClipDistance          2    y          2  CLIPDST   float    y
+// CHECK-NEXT: ; SV_ClipDistance          3      w        2  CLIPDST   float      w
+
+struct VS_OUT {
+  float3 first : First;
+  float cull0 : SV_CullDistance0;
+  float clip0 : SV_ClipDistance0;
+  float clip1[2] : SV_ClipDistance1;
+  float cull1[2] : SV_CullDistance1;
+  float clip3 : SV_ClipDistance3;
+  float withFirst : WithFirst;          // packs with First
+  float afterClipCull : BeforeClipCull; // packed before clip/cull in same row
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 28 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull.hlsl

@@ -0,0 +1,28 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_prefix_stable %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_ClipDistance          0   x           1  CLIPDST   float   x
+// CHECK-NEXT: ; SV_CullDistance          1    yzw        1  CULLDST   float    yzw
+// CHECK-NEXT: ; SV_ClipDistance          1    yz         2  CLIPDST   float    yz
+// CHECK-NEXT: ; SV_CullDistance          0   x           2  CULLDST   float   x
+// CHECK-NEXT: ; AfterClipCull            0   x           3     NONE   float   x
+
+struct VS_OUT {
+  float3 first : First;
+  // Clip/Cull reserves row
+  float clip0 : SV_ClipDistance0;
+  float3 cull1 : SV_CullDistance1;
+  float cull0 : SV_CullDistance0;
+  float2 clip1 : SV_ClipDistance1;
+  float withFirst : WithFirst;          // packs with First
+  float afterClipCull : AfterClipCull;  // cannot pack after clip/cull in same row
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 27 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull2.hlsl

@@ -0,0 +1,27 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_prefix_stable %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_ClipDistance          0   x           1  CLIPDST   float   x
+// CHECK-NEXT: ; SV_ClipDistance          1      w        1  CLIPDST   float      w
+// CHECK-NEXT: ; SV_CullDistance          1    yz         1  CULLDST   float    yz
+// CHECK-NEXT: ; SV_CullDistance          2    yz         2  CULLDST   float    yz
+// CHECK-NEXT: ; AfterClipCull            0   x           3     NONE   float   x
+
+struct VS_OUT {
+  float3 first : First;
+  // Clip/Cull reserves row
+  float clip0 : SV_ClipDistance0;
+  float2 cull1[2] : SV_CullDistance1;
+  float clip1 : SV_ClipDistance1;
+  float withFirst : WithFirst;          // packs with First
+  float afterClipCull : AfterClipCull;  // cannot pack after clip/cull in same row
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 30 - 0
tools/clang/test/CodeGenHLSL/quick-test/pack-clip-cull3.hlsl

@@ -0,0 +1,30 @@
+// RUN: %dxc -E main -T vs_6_0 -pack_prefix_stable %s | FileCheck %s
+
+// CHECK:      ; Output signature:
+// CHECK:      ; Name                 Index   Mask Register SysValue  Format   Used
+// CHECK-NEXT: ; -------------------- ----- ------ -------- -------- ------- ------
+// CHECK-NEXT: ; First                    0   xyz         0     NONE   float   xyz
+// CHECK-NEXT: ; WithFirst                0      w        0     NONE   float      w
+// CHECK-NEXT: ; SV_ClipDistance          0   x           1  CLIPDST   float   x
+// CHECK-NEXT: ; SV_ClipDistance          1      w        1  CLIPDST   float      w
+// CHECK-NEXT: ; SV_CullDistance          1    yz         1  CULLDST   float    yz
+// CHECK-NEXT: ; SV_ClipDistance          2      w        2  CLIPDST   float      w
+// CHECK-NEXT: ; SV_CullDistance          2    yz         2  CULLDST   float    yz
+// CHECK-NEXT: ; AfterClipCull            0   x           3     NONE   float   x
+
+struct VS_OUT {
+  float3 first : First;
+  // Clip/Cull reserves row
+  float clip0 : SV_ClipDistance0;
+  float2 cull1[2] : SV_CullDistance1;   // array works
+  float clip1[2] : SV_ClipDistance1;    // second array works without conflict
+  float withFirst : WithFirst;          // packs with First
+  // AfterClipCull looks like it could pack before clip/cull elements on row 2,
+  // but prefix-stable reserves rows for clip/cull, so it will not pack there.
+  float afterClipCull : AfterClipCull;
+};
+
+
+VS_OUT main() {
+	return (VS_OUT)1.0F;
+}

+ 3 - 2
utils/hct/hctdb.py

@@ -1789,6 +1789,7 @@ class db_dxil(object):
             (6, "Target", "Special handling for SV_Target"),
             (6, "Target", "Special handling for SV_Target"),
             (7, "TessFactor", "Special handling for tessellation factors"),
             (7, "TessFactor", "Special handling for tessellation factors"),
             (8, "Shadow", "Shadow element must be added to a signature for compatibility"),
             (8, "Shadow", "Shadow element must be added to a signature for compatibility"),
+            (8, "ClipCull", "Special packing rules for SV_ClipDistance or SV_CullDistance"),
             (9, "Invalid", ""),
             (9, "Invalid", ""),
             ])
             ])
         self.enums.append(SemanticInterpretationKind)
         self.enums.append(SemanticInterpretationKind)
@@ -1802,8 +1803,8 @@ class db_dxil(object):
             Position,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
             Position,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
             RenderTargetArrayIndex,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
             RenderTargetArrayIndex,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
             ViewPortArrayIndex,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
             ViewPortArrayIndex,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
-            ClipDistance,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
-            CullDistance,Arb,SV,NA,NA,SV,SV,Arb,Arb,SV,SV,SV,NA,SV,SV,NA,NA
+            ClipDistance,Arb,ClipCull,NA,NA,ClipCull,ClipCull,Arb,Arb,ClipCull,ClipCull,ClipCull,NA,ClipCull,ClipCull,NA,NA
+            CullDistance,Arb,ClipCull,NA,NA,ClipCull,ClipCull,Arb,Arb,ClipCull,ClipCull,ClipCull,NA,ClipCull,ClipCull,NA,NA
             OutputControlPointID,NA,NA,NA,NotInSig,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
             OutputControlPointID,NA,NA,NA,NotInSig,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA
             DomainLocation,NA,NA,NA,NA,NA,NA,NA,NotInSig,NA,NA,NA,NA,NA,NA,NA,NA
             DomainLocation,NA,NA,NA,NA,NA,NA,NA,NotInSig,NA,NA,NA,NA,NA,NA,NA,NA
             PrimitiveID,NA,NA,NotInSig,NotInSig,NA,NA,NA,NotInSig,NA,NA,NA,Shadow,SGV,SGV,NA,NA
             PrimitiveID,NA,NA,NotInSig,NotInSig,NA,NA,NA,NotInSig,NA,NA,NA,Shadow,SGV,SGV,NA,NA