Просмотр исходного кода

Store mesh payload in function props, fix Wave/Quad/Barrier validation (#2361)

- Compute mesh payload size before final object serialization
  - During CodeGen for MS based on payload parameter
  - During CollecShaderFlagsForModule for AS based on DispatchMesh call
- Store payload sizes in corresponding funtion properties, serializing
  these properly for HL and Dxil Modules
- Use payload sizes from function props for PSV0 data during serialization
- Validate measured and declared payload sizes, don't just fill in
  properties during validation
- Fix Wave/Quad allowed shader stages, enabling Quad* with CS-like models
- rename payloadByteSize members to payloadSizeInBytes
- Add GetMinShaderModelAndMask overload taking CallInst for additional
  detail required to produce correct SM mask for Barrier operations
Tex Riddell 6 лет назад
Родитель
Сommit
2facceae0b

+ 233 - 231
docs/DXIL.rst

@@ -2955,237 +2955,239 @@ The set of validation rules that are known to hold for a DXIL program is identif
 .. <py::lines('VALRULES-RST')>hctdb_instrhelp.get_valrules_rst()</py>
 .. VALRULES-RST:BEGIN
 
-======================================== =======================================================================================================================================================================================================================================================================================================
-Rule Code                                Description
-======================================== =======================================================================================================================================================================================================================================================================================================
-BITCODE.VALID                            TODO - Module must be bitcode-valid
-CONTAINER.PARTINVALID                    DXIL Container must not contain unknown parts
-CONTAINER.PARTMATCHES                    DXIL Container Parts must match Module
-CONTAINER.PARTMISSING                    DXIL Container requires certain parts, corresponding to module
-CONTAINER.PARTREPEATED                   DXIL Container must have only one of each part type
-CONTAINER.ROOTSIGNATUREINCOMPATIBLE      Root Signature in DXIL Container must be compatible with shader
-DECL.ATTRSTRUCT                          Attributes parameter must be struct type
-DECL.DXILFNEXTERN                        External function must be a DXIL function
-DECL.DXILNSRESERVED                      The DXIL reserved prefixes must only be used by built-in functions and types
-DECL.EXTRAARGS                           Extra arguments not allowed for shader functions
-DECL.FNATTRIBUTE                         Functions should only contain known function attributes
-DECL.FNFLATTENPARAM                      Function parameters must not use struct types
-DECL.FNISCALLED                          Functions can only be used by call instructions
-DECL.NOTUSEDEXTERNAL                     External declaration should not be used
-DECL.PARAMSTRUCT                         Callable function parameter must be struct type
-DECL.PAYLOADSTRUCT                       Payload parameter must be struct type
-DECL.RESOURCEINFNSIG                     Resources not allowed in function signatures
-DECL.SHADERMISSINGARG                    payload/params/attributes parameter is required for certain shader types
-DECL.SHADERRETURNVOID                    Shader functions must return void
-DECL.USEDEXTERNALFUNCTION                External function must be used
-DECL.USEDINTERNAL                        Internal declaration must be used
-FLOW.DEADLOOP                            Loop must have break
-FLOW.FUNCTIONCALL                        Function with parameter is not permitted
-FLOW.NORECUSION                          Recursion is not permitted
-FLOW.REDUCIBLE                           Execution flow must be reducible
-INSTR.ALLOWED                            Instructions must be of an allowed type
-INSTR.ATTRIBUTEATVERTEXNOINTERPOLATION   Attribute %0 must have nointerpolation mode in order to use GetAttributeAtVertex function.
-INSTR.BARRIERMODEFORNONCS                sync in a non-Compute Shader must only sync UAV (sync_uglobal)
-INSTR.BARRIERMODENOMEMORY                sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory).  Only _t (thread group sync) is optional.
-INSTR.BARRIERMODEUSELESSUGROUP           sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.
-INSTR.BUFFERUPDATECOUNTERONRESHASCOUNTER BufferUpdateCounter valid only when HasCounter is true
-INSTR.BUFFERUPDATECOUNTERONUAV           BufferUpdateCounter valid only on UAV
-INSTR.CALLOLOAD                          Call to DXIL intrinsic must match overload signature
-INSTR.CANNOTPULLPOSITION                 pull-model evaluation of position disallowed
-INSTR.CBUFFERCLASSFORCBUFFERHANDLE       Expect Cbuffer for CBufferLoad handle
-INSTR.CBUFFEROUTOFBOUND                  Cbuffer access out of bound
-INSTR.CHECKACCESSFULLYMAPPED             CheckAccessFullyMapped should only used on resource status
-INSTR.COORDINATECOUNTFORRAWTYPEDBUF      raw/typed buffer don't need 2 coordinates
-INSTR.COORDINATECOUNTFORSTRUCTBUF        structured buffer require 2 coordinates
-INSTR.CREATEHANDLEIMMRANGEID             Local resource must map to global resource.
-INSTR.DXILSTRUCTUSER                     Dxil struct types should only used by ExtractValue
-INSTR.DXILSTRUCTUSEROUTOFBOUND           Index out of bound when extract value from dxil struct types
-INSTR.EVALINTERPOLATIONMODE              Interpolation mode on %0 used with eval_* instruction must be linear, linear_centroid, linear_noperspective, linear_noperspective_centroid, linear_sample or linear_noperspective_sample
-INSTR.EXTRACTVALUE                       ExtractValue should only be used on dxil struct types and cmpxchg
-INSTR.FAILTORESLOVETGSMPOINTER           TGSM pointers must originate from an unambiguous TGSM global variable.
-INSTR.HANDLENOTFROMCREATEHANDLE          Resource handle should returned by createHandle
-INSTR.IMMBIASFORSAMPLEB                  bias amount for sample_b must be in the range [%0,%1], but %2 was specified as an immediate
-INSTR.INBOUNDSACCESS                     Access to out-of-bounds memory is disallowed
-INSTR.MINPRECISIONNOTPRECISE             Instructions marked precise may not refer to minprecision values
-INSTR.MINPRECISONBITCAST                 Bitcast on minprecison types is not allowed
-INSTR.MIPLEVELFORGETDIMENSION            Use mip level on buffer when GetDimensions
-INSTR.MIPONUAVLOAD                       uav load don't support mipLevel/sampleIndex
-INSTR.MISSINGSETMESHOUTPUTCOUNTS         Missing SetMeshOutputCounts call.
-INSTR.MULTIPLEGETMESHPAYLOAD             GetMeshPayload cannot be called multiple times.
-INSTR.MULTIPLESETMESHOUTPUTCOUNTS        SetMeshOUtputCounts cannot be called multiple times.
-INSTR.NOGENERICPTRADDRSPACECAST          Address space cast between pointer types must have one part to be generic address space
-INSTR.NOIDIVBYZERO                       No signed integer division by zero
-INSTR.NOINDEFINITEACOS                   No indefinite arccosine
-INSTR.NOINDEFINITEASIN                   No indefinite arcsine
-INSTR.NOINDEFINITEDSXY                   No indefinite derivative calculation
-INSTR.NOINDEFINITELOG                    No indefinite logarithm
-INSTR.NONDOMINATINGDISPATCHMESH          Non-Dominating DispatchMesh call.
-INSTR.NONDOMINATINGSETMESHOUTPUTCOUNTS   Non-Dominating SetMeshOutputCounts call.
-INSTR.NOREADINGUNINITIALIZED             Instructions should not read uninitialized value
-INSTR.NOTONCEDISPATCHMESH                DispatchMesh must be called exactly once in an Amplification shader.
-INSTR.NOUDIVBYZERO                       No unsigned integer division by zero
-INSTR.OFFSETONUAVLOAD                    uav load don't support offset
-INSTR.OLOAD                              DXIL intrinsic overload must be valid
-INSTR.ONLYONEALLOCCONSUME                RWStructuredBuffers may increment or decrement their counters, but not both.
-INSTR.OPCODERESERVED                     Instructions must not reference reserved opcodes
-INSTR.OPCONST                            DXIL intrinsic requires an immediate constant operand
-INSTR.OPCONSTRANGE                       Constant values must be in-range for operation
-INSTR.OPERANDRANGE                       DXIL intrinsic operand must be within defined range
-INSTR.PTRBITCAST                         Pointer type bitcast must be have same size
-INSTR.RESOURCECLASSFORLOAD               load can only run on UAV/SRV resource
-INSTR.RESOURCECLASSFORSAMPLERGATHER      sample, lod and gather should be on srv resource.
-INSTR.RESOURCECLASSFORUAVSTORE           store should be on uav resource.
-INSTR.RESOURCECOORDINATEMISS             coord uninitialized
-INSTR.RESOURCECOORDINATETOOMANY          out of bound coord must be undef
-INSTR.RESOURCEKINDFORBUFFERLOADSTORE     buffer load/store only works on Raw/Typed/StructuredBuffer
-INSTR.RESOURCEKINDFORCALCLOD             lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray
-INSTR.RESOURCEKINDFORGATHER              gather requires resource declared as texture/2D/Cube/2DArray/CubeArray
-INSTR.RESOURCEKINDFORGETDIM              Invalid resource kind on GetDimensions
-INSTR.RESOURCEKINDFORSAMPLE              sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray
-INSTR.RESOURCEKINDFORSAMPLEC             samplec requires resource declared as texture1D/2D/Cube/1DArray/2DArray/CubeArray
-INSTR.RESOURCEKINDFORTEXTURELOAD         texture load only works on Texture1D/1DArray/2D/2DArray/3D/MS2D/MS2DArray
-INSTR.RESOURCEKINDFORTEXTURESTORE        texture store only works on Texture1D/1DArray/2D/2DArray/3D
-INSTR.RESOURCEKINDFORTRACERAY            TraceRay should only use RTAccelerationStructure
-INSTR.RESOURCEMAPTOSINGLEENTRY           Fail to map resource to resource table
-INSTR.RESOURCEOFFSETMISS                 offset uninitialized
-INSTR.RESOURCEOFFSETTOOMANY              out of bound offset must be undef
-INSTR.RESOURCEUSER                       Resource should only used by Load/GEP/Call
-INSTR.SAMPLECOMPTYPE                     sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.
-INSTR.SAMPLEINDEXFORLOAD2DMS             load on Texture2DMS/2DMSArray require sampleIndex
-INSTR.SAMPLERMODEFORLOD                  lod instruction requires sampler declared in default mode
-INSTR.SAMPLERMODEFORSAMPLE               sample/_l/_d/_cl_s/gather instruction requires sampler declared in default mode
-INSTR.SAMPLERMODEFORSAMPLEC              sample_c_*/gather_c instructions require sampler declared in comparison mode
-INSTR.SIGNATUREOPERATIONNOTINENTRY       Dxil operation for input output signature must be in entryPoints.
-INSTR.STATUS                             Resource status should only used by CheckAccessFullyMapped
-INSTR.STRUCTBITCAST                      Bitcast on struct types is not allowed
-INSTR.TEXTUREOFFSET                      offset texture instructions must take offset which can resolve to integer literal in the range -8 to 7
-INSTR.TGSMRACECOND                       Race condition writing to shared memory detected, consider making this write conditional
-INSTR.UNDEFRESULTFORGETDIMENSION         GetDimensions used undef dimension %0 on %1
-INSTR.WRITEMASKFORTYPEDUAVSTORE          store on typed uav must write to all four components of the UAV
-INSTR.WRITEMASKMATCHVALUEFORUAVSTORE     uav store write mask must match store value mask, write mask is %0 and store value mask is %1
-META.BARYCENTRICSFLOAT3                  only 'float3' type is allowed for SV_Barycentrics.
-META.BARYCENTRICSINTERPOLATION           SV_Barycentrics cannot be used with 'nointerpolation' type
-META.BARYCENTRICSTWOPERSPECTIVES         There can only be up to two input attributes of SV_Barycentrics with different perspective interpolation mode.
-META.BRANCHFLATTEN                       Can't use branch and flatten attributes together
-META.CLIPCULLMAXCOMPONENTS               Combined elements of SV_ClipDistance and SV_CullDistance must fit in 8 components
-META.CLIPCULLMAXROWS                     Combined elements of SV_ClipDistance and SV_CullDistance must fit in two rows.
-META.CONTROLFLOWHINTNOTONCONTROLFLOW     Control flow hint only works on control flow inst
-META.DENSERESIDS                         Resource identifiers must be zero-based and dense
-META.DUPLICATESYSVALUE                   System value may only appear once in signature
-META.ENTRYFUNCTION                       entrypoint not found
-META.FLAGSUSAGE                          Flags must match usage
-META.FORCECASEONSWITCH                   Attribute forcecase only works for switch
-META.FUNCTIONANNOTATION                  Cannot find function annotation for %0
-META.GLCNOTONAPPENDCONSUME               globallycoherent cannot be used with append/consume buffers
-META.INTEGERINTERPMODE                   Interpolation mode on integer must be Constant
-META.INTERPMODEINONEROW                  Interpolation mode must be identical for all elements packed into the same row.
-META.INTERPMODEVALID                     Interpolation mode must be valid
-META.INVALIDCONTROLFLOWHINT              Invalid control flow hint
-META.KNOWN                               Named metadata should be known
-META.MAXTESSFACTOR                       Hull Shader MaxTessFactor must be [%0..%1].  %2 specified
-META.NOENTRYPROPSFORENTRY                EntryPoints must have entry properties.
-META.NOSEMANTICOVERLAP                   Semantics must not overlap
-META.REQUIRED                            TODO - Required metadata missing
-META.SEMAKINDMATCHESNAME                 Semantic name must match system value, when defined.
-META.SEMAKINDVALID                       Semantic kind must be valid
-META.SEMANTICCOMPTYPE                    %0 must be %1
-META.SEMANTICINDEXMAX                    System value semantics have a maximum valid semantic index
-META.SEMANTICLEN                         Semantic length must be at least 1 and at most 64
-META.SEMANTICSHOULDBEALLOCATED           Semantic should have a valid packing location
-META.SEMANTICSHOULDNOTBEALLOCATED        Semantic should have a packing location of -1
-META.SIGNATURECOMPTYPE                   signature %0 specifies unrecognized or invalid component type
-META.SIGNATUREDATAWIDTH                  Data width must be identical for all elements packed into the same row.
-META.SIGNATUREILLEGALCOMPONENTORDER      Component ordering for packed elements must be: arbitrary < system value < system generated value
-META.SIGNATUREINDEXCONFLICT              Only elements with compatible indexing rules may be packed together
-META.SIGNATUREOUTOFRANGE                 Signature elements must fit within maximum signature size
-META.SIGNATUREOVERLAP                    Signature elements may not overlap in packing location.
-META.STRUCTBUFALIGNMENT                  StructuredBuffer stride not aligned
-META.STRUCTBUFALIGNMENTOUTOFBOUND        StructuredBuffer stride out of bounds
-META.SYSTEMVALUEROWS                     System value may only have 1 row
-META.TARGET                              Target triple must be 'dxil-ms-dx'
-META.TESSELLATOROUTPUTPRIMITIVE          Invalid Tessellator Output Primitive specified. Must be point, line, triangleCW or triangleCCW.
-META.TESSELLATORPARTITION                Invalid Tessellator Partitioning specified. Must be integer, pow2, fractional_odd or fractional_even.
-META.TEXTURETYPE                         elements of typed buffers and textures must fit in four 32-bit quantities
-META.USED                                All metadata must be used by dxil
-META.VALIDSAMPLERMODE                    Invalid sampler mode on sampler
-META.VALUERANGE                          Metadata value must be within range
-META.WELLFORMED                          TODO - Metadata must be well-formed in operand count and types
-SM.64BITRAWBUFFERLOADSTORE               i64/f64 rawBufferLoad/Store overloads are allowed after SM 6.3
-SM.AMPLIFICATIONSHADERPAYLOADSIZE        For shader '%0', payload size is greater than %1
-SM.APPENDANDCONSUMEONSAMEUAV             BufferUpdateCounter inc and dec on a given UAV (%d) cannot both be in the same shader for shader model less than 5.1.
-SM.CBUFFERARRAYOFFSETALIGNMENT           CBuffer array offset must be aligned to 16-bytes
-SM.CBUFFERELEMENTOVERFLOW                CBuffer elements must not overflow
-SM.CBUFFEROFFSETOVERLAP                  CBuffer offsets must not overlap
-SM.CBUFFERTEMPLATETYPEMUSTBESTRUCT       D3D12 constant/texture buffer template element can only be a struct
-SM.COMPLETEPOSITION                      Not all elements of SV_Position were written
-SM.CONSTANTINTERPMODE                    Interpolation mode must be constant for MS primitive output.
-SM.COUNTERONLYONSTRUCTBUF                BufferUpdateCounter valid only on structured buffers
-SM.CSNOSIGNATURES                        Compute shaders must not have shader signatures.
-SM.DOMAINLOCATIONIDXOOB                  DomainLocation component index out of bounds for the domain.
-SM.DSINPUTCONTROLPOINTCOUNTRANGE         DS input control point count must be [0..%0].  %1 specified
-SM.DXILVERSION                           Target shader model requires specific Dxil Version
-SM.GSINSTANCECOUNTRANGE                  GS instance count must be [1..%0].  %1 specified
-SM.GSOUTPUTVERTEXCOUNTRANGE              GS output vertex count must be [0..%0].  %1 specified
-SM.GSTOTALOUTPUTVERTEXDATARANGE          Declared output vertex count (%0) multiplied by the total number of declared scalar components of output data (%1) equals %2.  This value cannot be greater than %3
-SM.GSVALIDINPUTPRIMITIVE                 GS input primitive unrecognized
-SM.GSVALIDOUTPUTPRIMITIVETOPOLOGY        GS output primitive topology unrecognized
-SM.HSINPUTCONTROLPOINTCOUNTRANGE         HS input control point count must be [0..%0].  %1 specified
-SM.HULLPASSTHRUCONTROLPOINTCOUNTMATCH    For pass thru hull shader, input control point count must match output control point count
-SM.INSIDETESSFACTORSIZEMATCHDOMAIN       InsideTessFactor rows, columns (%0, %1) invalid for domain %2.  Expected %3 rows and 1 column.
-SM.INVALIDRESOURCECOMPTYPE               Invalid resource return type
-SM.INVALIDRESOURCEKIND                   Invalid resources kind
-SM.INVALIDSAMPLERFEEDBACKTYPE            Invalid sampler feedback type
-SM.INVALIDTEXTUREKINDONUAV               Texture2DMS[Array] or TextureCube[Array] resources are not supported with UAVs
-SM.ISOLINEOUTPUTPRIMITIVEMISMATCH        Hull Shader declared with IsoLine Domain must specify output primitive point or line. Triangle_cw or triangle_ccw output are not compatible with the IsoLine Domain.
-SM.MAXMSSMSIZE                           Total Thread Group Shared Memory storage is %0, exceeded %1
-SM.MAXTGSMSIZE                           Total Thread Group Shared Memory storage is %0, exceeded %1
-SM.MAXTHEADGROUP                         Declared Thread Group Count %0 (X*Y*Z) is beyond the valid maximum of %1
-SM.MESHPSIGROWCOUNT                      For shader '%0', primitive output signatures are taking up more than %1 rows
-SM.MESHSHADERINOUTSIZE                   For shader '%0', input plus output size is greater than %1
-SM.MESHSHADERMAXPRIMITIVECOUNT           MS max primitive output count must be [0..%0].  %1 specified
-SM.MESHSHADERMAXVERTEXCOUNT              MS max vertex output count must be [0..%0].  %1 specified
-SM.MESHSHADEROUTPUTSIZE                  For shader '%0', vertex plus primitive output size is greater than %1
-SM.MESHSHADERPAYLOADSIZE                 For shader '%0', payload size is greater than %1
-SM.MESHTOTALSIGROWCOUNT                  For shader '%0', vertex and primitive output signatures are taking up more than %1 rows
-SM.MESHVSIGROWCOUNT                      For shader '%0', vertex output signatures are taking up more than %1 rows
-SM.MULTISTREAMMUSTBEPOINT                When multiple GS output streams are used they must be pointlists
-SM.NAME                                  Target shader model name must be known
-SM.NOINTERPMODE                          Interpolation mode must be undefined for VS input/PS output/patch constant.
-SM.NOPSOUTPUTIDX                         Pixel shader output registers are not indexable.
-SM.OPCODE                                Opcode must be defined in target shader model
-SM.OPCODEININVALIDFUNCTION               Invalid DXIL opcode usage like StorePatchConstant in patch constant function
-SM.OPERAND                               Operand must be defined in target shader model
-SM.OUTPUTCONTROLPOINTCOUNTRANGE          output control point count must be [0..%0].  %1 specified
-SM.OUTPUTCONTROLPOINTSTOTALSCALARS       Total number of scalars across all HS output control points must not exceed
-SM.PATCHCONSTANTONLYFORHSDS              patch constant signature only valid in HS and DS
-SM.PSCONSISTENTINTERP                    Interpolation mode for PS input position must be linear_noperspective_centroid or linear_noperspective_sample when outputting oDepthGE or oDepthLE and not running at sample frequency (which is forced by inputting SV_SampleIndex or declaring an input linear_sample or linear_noperspective_sample)
-SM.PSCOVERAGEANDINNERCOVERAGE            InnerCoverage and Coverage are mutually exclusive.
-SM.PSMULTIPLEDEPTHSEMANTIC               Pixel Shader only allows one type of depth semantic to be declared
-SM.PSOUTPUTSEMANTIC                      Pixel Shader allows output semantics to be SV_Target, SV_Depth, SV_DepthGreaterEqual, SV_DepthLessEqual, SV_Coverage or SV_StencilRef, %0 found
-SM.PSTARGETCOL0                          SV_Target packed location must start at column 0
-SM.PSTARGETINDEXMATCHESROW               SV_Target semantic index must match packed row location
-SM.RAYSHADERPAYLOADSIZE                  For shader '%0', %1 size is smaller than argument's allocation size
-SM.RAYSHADERSIGNATURES                   Ray tracing shader '%0' should not have any shader signatures
-SM.RESOURCERANGEOVERLAP                  Resource ranges must not overlap
-SM.ROVONLYINPS                           RasterizerOrdered objects are only allowed in 5.0+ pixel shaders
-SM.SAMPLECOUNTONLYON2DMS                 Only Texture2DMS/2DMSArray could has sample count
-SM.SEMANTIC                              Semantic must be defined in target shader model
-SM.STREAMINDEXRANGE                      Stream index (%0) must between 0 and %1
-SM.TESSFACTORFORDOMAIN                   Required TessFactor for domain not found declared anywhere in Patch Constant data
-SM.TESSFACTORSIZEMATCHDOMAIN             TessFactor rows, columns (%0, %1) invalid for domain %2.  Expected %3 rows and 1 column.
-SM.THREADGROUPCHANNELRANGE               Declared Thread Group %0 size %1 outside valid range [%2..%3]
-SM.TRIOUTPUTPRIMITIVEMISMATCH            Hull Shader declared with Tri Domain must specify output primitive point, triangle_cw or triangle_ccw. Line output is not compatible with the Tri domain
-SM.UNDEFINEDOUTPUT                       Not all elements of output %0 were written
-SM.VALIDDOMAIN                           Invalid Tessellator Domain specified. Must be isoline, tri or quad
-SM.VIEWIDNEEDSSLOT                       ViewID requires compatible space in pixel shader input signature
-SM.ZEROHSINPUTCONTROLPOINTWITHINPUT      When HS input control point count is 0, no input signature should exist
-TYPES.DEFINED                            Type must be defined based on DXIL primitives
-TYPES.I8                                 I8 can only used as immediate value for intrinsic
-TYPES.INTWIDTH                           Int type must be of valid width
-TYPES.NOMULTIDIM                         Only one dimension allowed for array type
-TYPES.NOVECTOR                           Vector types must not be present
-UNI.NOWAVESENSITIVEGRADIENT              Gradient operations are not affected by wave-sensitive data or control flow.
-======================================== =======================================================================================================================================================================================================================================================================================================
+========================================= =======================================================================================================================================================================================================================================================================================================
+Rule Code                                 Description
+========================================= =======================================================================================================================================================================================================================================================================================================
+BITCODE.VALID                             TODO - Module must be bitcode-valid
+CONTAINER.PARTINVALID                     DXIL Container must not contain unknown parts
+CONTAINER.PARTMATCHES                     DXIL Container Parts must match Module
+CONTAINER.PARTMISSING                     DXIL Container requires certain parts, corresponding to module
+CONTAINER.PARTREPEATED                    DXIL Container must have only one of each part type
+CONTAINER.ROOTSIGNATUREINCOMPATIBLE       Root Signature in DXIL Container must be compatible with shader
+DECL.ATTRSTRUCT                           Attributes parameter must be struct type
+DECL.DXILFNEXTERN                         External function must be a DXIL function
+DECL.DXILNSRESERVED                       The DXIL reserved prefixes must only be used by built-in functions and types
+DECL.EXTRAARGS                            Extra arguments not allowed for shader functions
+DECL.FNATTRIBUTE                          Functions should only contain known function attributes
+DECL.FNFLATTENPARAM                       Function parameters must not use struct types
+DECL.FNISCALLED                           Functions can only be used by call instructions
+DECL.NOTUSEDEXTERNAL                      External declaration should not be used
+DECL.PARAMSTRUCT                          Callable function parameter must be struct type
+DECL.PAYLOADSTRUCT                        Payload parameter must be struct type
+DECL.RESOURCEINFNSIG                      Resources not allowed in function signatures
+DECL.SHADERMISSINGARG                     payload/params/attributes parameter is required for certain shader types
+DECL.SHADERRETURNVOID                     Shader functions must return void
+DECL.USEDEXTERNALFUNCTION                 External function must be used
+DECL.USEDINTERNAL                         Internal declaration must be used
+FLOW.DEADLOOP                             Loop must have break
+FLOW.FUNCTIONCALL                         Function with parameter is not permitted
+FLOW.NORECUSION                           Recursion is not permitted
+FLOW.REDUCIBLE                            Execution flow must be reducible
+INSTR.ALLOWED                             Instructions must be of an allowed type
+INSTR.ATTRIBUTEATVERTEXNOINTERPOLATION    Attribute %0 must have nointerpolation mode in order to use GetAttributeAtVertex function.
+INSTR.BARRIERMODEFORNONCS                 sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal)
+INSTR.BARRIERMODENOMEMORY                 sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory).  Only _t (thread group sync) is optional.
+INSTR.BARRIERMODEUSELESSUGROUP            sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.
+INSTR.BUFFERUPDATECOUNTERONRESHASCOUNTER  BufferUpdateCounter valid only when HasCounter is true
+INSTR.BUFFERUPDATECOUNTERONUAV            BufferUpdateCounter valid only on UAV
+INSTR.CALLOLOAD                           Call to DXIL intrinsic must match overload signature
+INSTR.CANNOTPULLPOSITION                  pull-model evaluation of position disallowed
+INSTR.CBUFFERCLASSFORCBUFFERHANDLE        Expect Cbuffer for CBufferLoad handle
+INSTR.CBUFFEROUTOFBOUND                   Cbuffer access out of bound
+INSTR.CHECKACCESSFULLYMAPPED              CheckAccessFullyMapped should only used on resource status
+INSTR.COORDINATECOUNTFORRAWTYPEDBUF       raw/typed buffer don't need 2 coordinates
+INSTR.COORDINATECOUNTFORSTRUCTBUF         structured buffer require 2 coordinates
+INSTR.CREATEHANDLEIMMRANGEID              Local resource must map to global resource.
+INSTR.DXILSTRUCTUSER                      Dxil struct types should only used by ExtractValue
+INSTR.DXILSTRUCTUSEROUTOFBOUND            Index out of bound when extract value from dxil struct types
+INSTR.EVALINTERPOLATIONMODE               Interpolation mode on %0 used with eval_* instruction must be linear, linear_centroid, linear_noperspective, linear_noperspective_centroid, linear_sample or linear_noperspective_sample
+INSTR.EXTRACTVALUE                        ExtractValue should only be used on dxil struct types and cmpxchg
+INSTR.FAILTORESLOVETGSMPOINTER            TGSM pointers must originate from an unambiguous TGSM global variable.
+INSTR.HANDLENOTFROMCREATEHANDLE           Resource handle should returned by createHandle
+INSTR.IMMBIASFORSAMPLEB                   bias amount for sample_b must be in the range [%0,%1], but %2 was specified as an immediate
+INSTR.INBOUNDSACCESS                      Access to out-of-bounds memory is disallowed
+INSTR.MINPRECISIONNOTPRECISE              Instructions marked precise may not refer to minprecision values
+INSTR.MINPRECISONBITCAST                  Bitcast on minprecison types is not allowed
+INSTR.MIPLEVELFORGETDIMENSION             Use mip level on buffer when GetDimensions
+INSTR.MIPONUAVLOAD                        uav load don't support mipLevel/sampleIndex
+INSTR.MISSINGSETMESHOUTPUTCOUNTS          Missing SetMeshOutputCounts call.
+INSTR.MULTIPLEGETMESHPAYLOAD              GetMeshPayload cannot be called multiple times.
+INSTR.MULTIPLESETMESHOUTPUTCOUNTS         SetMeshOUtputCounts cannot be called multiple times.
+INSTR.NOGENERICPTRADDRSPACECAST           Address space cast between pointer types must have one part to be generic address space
+INSTR.NOIDIVBYZERO                        No signed integer division by zero
+INSTR.NOINDEFINITEACOS                    No indefinite arccosine
+INSTR.NOINDEFINITEASIN                    No indefinite arcsine
+INSTR.NOINDEFINITEDSXY                    No indefinite derivative calculation
+INSTR.NOINDEFINITELOG                     No indefinite logarithm
+INSTR.NONDOMINATINGDISPATCHMESH           Non-Dominating DispatchMesh call.
+INSTR.NONDOMINATINGSETMESHOUTPUTCOUNTS    Non-Dominating SetMeshOutputCounts call.
+INSTR.NOREADINGUNINITIALIZED              Instructions should not read uninitialized value
+INSTR.NOTONCEDISPATCHMESH                 DispatchMesh must be called exactly once in an Amplification shader.
+INSTR.NOUDIVBYZERO                        No unsigned integer division by zero
+INSTR.OFFSETONUAVLOAD                     uav load don't support offset
+INSTR.OLOAD                               DXIL intrinsic overload must be valid
+INSTR.ONLYONEALLOCCONSUME                 RWStructuredBuffers may increment or decrement their counters, but not both.
+INSTR.OPCODERESERVED                      Instructions must not reference reserved opcodes
+INSTR.OPCONST                             DXIL intrinsic requires an immediate constant operand
+INSTR.OPCONSTRANGE                        Constant values must be in-range for operation
+INSTR.OPERANDRANGE                        DXIL intrinsic operand must be within defined range
+INSTR.PTRBITCAST                          Pointer type bitcast must be have same size
+INSTR.RESOURCECLASSFORLOAD                load can only run on UAV/SRV resource
+INSTR.RESOURCECLASSFORSAMPLERGATHER       sample, lod and gather should be on srv resource.
+INSTR.RESOURCECLASSFORUAVSTORE            store should be on uav resource.
+INSTR.RESOURCECOORDINATEMISS              coord uninitialized
+INSTR.RESOURCECOORDINATETOOMANY           out of bound coord must be undef
+INSTR.RESOURCEKINDFORBUFFERLOADSTORE      buffer load/store only works on Raw/Typed/StructuredBuffer
+INSTR.RESOURCEKINDFORCALCLOD              lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray
+INSTR.RESOURCEKINDFORGATHER               gather requires resource declared as texture/2D/Cube/2DArray/CubeArray
+INSTR.RESOURCEKINDFORGETDIM               Invalid resource kind on GetDimensions
+INSTR.RESOURCEKINDFORSAMPLE               sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray
+INSTR.RESOURCEKINDFORSAMPLEC              samplec requires resource declared as texture1D/2D/Cube/1DArray/2DArray/CubeArray
+INSTR.RESOURCEKINDFORTEXTURELOAD          texture load only works on Texture1D/1DArray/2D/2DArray/3D/MS2D/MS2DArray
+INSTR.RESOURCEKINDFORTEXTURESTORE         texture store only works on Texture1D/1DArray/2D/2DArray/3D
+INSTR.RESOURCEKINDFORTRACERAY             TraceRay should only use RTAccelerationStructure
+INSTR.RESOURCEMAPTOSINGLEENTRY            Fail to map resource to resource table
+INSTR.RESOURCEOFFSETMISS                  offset uninitialized
+INSTR.RESOURCEOFFSETTOOMANY               out of bound offset must be undef
+INSTR.RESOURCEUSER                        Resource should only used by Load/GEP/Call
+INSTR.SAMPLECOMPTYPE                      sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.
+INSTR.SAMPLEINDEXFORLOAD2DMS              load on Texture2DMS/2DMSArray require sampleIndex
+INSTR.SAMPLERMODEFORLOD                   lod instruction requires sampler declared in default mode
+INSTR.SAMPLERMODEFORSAMPLE                sample/_l/_d/_cl_s/gather instruction requires sampler declared in default mode
+INSTR.SAMPLERMODEFORSAMPLEC               sample_c_*/gather_c instructions require sampler declared in comparison mode
+INSTR.SIGNATUREOPERATIONNOTINENTRY        Dxil operation for input output signature must be in entryPoints.
+INSTR.STATUS                              Resource status should only used by CheckAccessFullyMapped
+INSTR.STRUCTBITCAST                       Bitcast on struct types is not allowed
+INSTR.TEXTUREOFFSET                       offset texture instructions must take offset which can resolve to integer literal in the range -8 to 7
+INSTR.TGSMRACECOND                        Race condition writing to shared memory detected, consider making this write conditional
+INSTR.UNDEFRESULTFORGETDIMENSION          GetDimensions used undef dimension %0 on %1
+INSTR.WRITEMASKFORTYPEDUAVSTORE           store on typed uav must write to all four components of the UAV
+INSTR.WRITEMASKMATCHVALUEFORUAVSTORE      uav store write mask must match store value mask, write mask is %0 and store value mask is %1
+META.BARYCENTRICSFLOAT3                   only 'float3' type is allowed for SV_Barycentrics.
+META.BARYCENTRICSINTERPOLATION            SV_Barycentrics cannot be used with 'nointerpolation' type
+META.BARYCENTRICSTWOPERSPECTIVES          There can only be up to two input attributes of SV_Barycentrics with different perspective interpolation mode.
+META.BRANCHFLATTEN                        Can't use branch and flatten attributes together
+META.CLIPCULLMAXCOMPONENTS                Combined elements of SV_ClipDistance and SV_CullDistance must fit in 8 components
+META.CLIPCULLMAXROWS                      Combined elements of SV_ClipDistance and SV_CullDistance must fit in two rows.
+META.CONTROLFLOWHINTNOTONCONTROLFLOW      Control flow hint only works on control flow inst
+META.DENSERESIDS                          Resource identifiers must be zero-based and dense
+META.DUPLICATESYSVALUE                    System value may only appear once in signature
+META.ENTRYFUNCTION                        entrypoint not found
+META.FLAGSUSAGE                           Flags must match usage
+META.FORCECASEONSWITCH                    Attribute forcecase only works for switch
+META.FUNCTIONANNOTATION                   Cannot find function annotation for %0
+META.GLCNOTONAPPENDCONSUME                globallycoherent cannot be used with append/consume buffers
+META.INTEGERINTERPMODE                    Interpolation mode on integer must be Constant
+META.INTERPMODEINONEROW                   Interpolation mode must be identical for all elements packed into the same row.
+META.INTERPMODEVALID                      Interpolation mode must be valid
+META.INVALIDCONTROLFLOWHINT               Invalid control flow hint
+META.KNOWN                                Named metadata should be known
+META.MAXTESSFACTOR                        Hull Shader MaxTessFactor must be [%0..%1].  %2 specified
+META.NOENTRYPROPSFORENTRY                 EntryPoints must have entry properties.
+META.NOSEMANTICOVERLAP                    Semantics must not overlap
+META.REQUIRED                             TODO - Required metadata missing
+META.SEMAKINDMATCHESNAME                  Semantic name must match system value, when defined.
+META.SEMAKINDVALID                        Semantic kind must be valid
+META.SEMANTICCOMPTYPE                     %0 must be %1
+META.SEMANTICINDEXMAX                     System value semantics have a maximum valid semantic index
+META.SEMANTICLEN                          Semantic length must be at least 1 and at most 64
+META.SEMANTICSHOULDBEALLOCATED            Semantic should have a valid packing location
+META.SEMANTICSHOULDNOTBEALLOCATED         Semantic should have a packing location of -1
+META.SIGNATURECOMPTYPE                    signature %0 specifies unrecognized or invalid component type
+META.SIGNATUREDATAWIDTH                   Data width must be identical for all elements packed into the same row.
+META.SIGNATUREILLEGALCOMPONENTORDER       Component ordering for packed elements must be: arbitrary < system value < system generated value
+META.SIGNATUREINDEXCONFLICT               Only elements with compatible indexing rules may be packed together
+META.SIGNATUREOUTOFRANGE                  Signature elements must fit within maximum signature size
+META.SIGNATUREOVERLAP                     Signature elements may not overlap in packing location.
+META.STRUCTBUFALIGNMENT                   StructuredBuffer stride not aligned
+META.STRUCTBUFALIGNMENTOUTOFBOUND         StructuredBuffer stride out of bounds
+META.SYSTEMVALUEROWS                      System value may only have 1 row
+META.TARGET                               Target triple must be 'dxil-ms-dx'
+META.TESSELLATOROUTPUTPRIMITIVE           Invalid Tessellator Output Primitive specified. Must be point, line, triangleCW or triangleCCW.
+META.TESSELLATORPARTITION                 Invalid Tessellator Partitioning specified. Must be integer, pow2, fractional_odd or fractional_even.
+META.TEXTURETYPE                          elements of typed buffers and textures must fit in four 32-bit quantities
+META.USED                                 All metadata must be used by dxil
+META.VALIDSAMPLERMODE                     Invalid sampler mode on sampler
+META.VALUERANGE                           Metadata value must be within range
+META.WELLFORMED                           TODO - Metadata must be well-formed in operand count and types
+SM.64BITRAWBUFFERLOADSTORE                i64/f64 rawBufferLoad/Store overloads are allowed after SM 6.3
+SM.AMPLIFICATIONSHADERPAYLOADSIZE         For shader '%0', payload size is greater than %1
+SM.AMPLIFICATIONSHADERPAYLOADSIZEDECLARED For shader '%0', payload size %1 is greater than declared size of %2 bytes
+SM.APPENDANDCONSUMEONSAMEUAV              BufferUpdateCounter inc and dec on a given UAV (%d) cannot both be in the same shader for shader model less than 5.1.
+SM.CBUFFERARRAYOFFSETALIGNMENT            CBuffer array offset must be aligned to 16-bytes
+SM.CBUFFERELEMENTOVERFLOW                 CBuffer elements must not overflow
+SM.CBUFFEROFFSETOVERLAP                   CBuffer offsets must not overlap
+SM.CBUFFERTEMPLATETYPEMUSTBESTRUCT        D3D12 constant/texture buffer template element can only be a struct
+SM.COMPLETEPOSITION                       Not all elements of SV_Position were written
+SM.CONSTANTINTERPMODE                     Interpolation mode must be constant for MS primitive output.
+SM.COUNTERONLYONSTRUCTBUF                 BufferUpdateCounter valid only on structured buffers
+SM.CSNOSIGNATURES                         Compute shaders must not have shader signatures.
+SM.DOMAINLOCATIONIDXOOB                   DomainLocation component index out of bounds for the domain.
+SM.DSINPUTCONTROLPOINTCOUNTRANGE          DS input control point count must be [0..%0].  %1 specified
+SM.DXILVERSION                            Target shader model requires specific Dxil Version
+SM.GSINSTANCECOUNTRANGE                   GS instance count must be [1..%0].  %1 specified
+SM.GSOUTPUTVERTEXCOUNTRANGE               GS output vertex count must be [0..%0].  %1 specified
+SM.GSTOTALOUTPUTVERTEXDATARANGE           Declared output vertex count (%0) multiplied by the total number of declared scalar components of output data (%1) equals %2.  This value cannot be greater than %3
+SM.GSVALIDINPUTPRIMITIVE                  GS input primitive unrecognized
+SM.GSVALIDOUTPUTPRIMITIVETOPOLOGY         GS output primitive topology unrecognized
+SM.HSINPUTCONTROLPOINTCOUNTRANGE          HS input control point count must be [0..%0].  %1 specified
+SM.HULLPASSTHRUCONTROLPOINTCOUNTMATCH     For pass thru hull shader, input control point count must match output control point count
+SM.INSIDETESSFACTORSIZEMATCHDOMAIN        InsideTessFactor rows, columns (%0, %1) invalid for domain %2.  Expected %3 rows and 1 column.
+SM.INVALIDRESOURCECOMPTYPE                Invalid resource return type
+SM.INVALIDRESOURCEKIND                    Invalid resources kind
+SM.INVALIDSAMPLERFEEDBACKTYPE             Invalid sampler feedback type
+SM.INVALIDTEXTUREKINDONUAV                Texture2DMS[Array] or TextureCube[Array] resources are not supported with UAVs
+SM.ISOLINEOUTPUTPRIMITIVEMISMATCH         Hull Shader declared with IsoLine Domain must specify output primitive point or line. Triangle_cw or triangle_ccw output are not compatible with the IsoLine Domain.
+SM.MAXMSSMSIZE                            Total Thread Group Shared Memory storage is %0, exceeded %1
+SM.MAXTGSMSIZE                            Total Thread Group Shared Memory storage is %0, exceeded %1
+SM.MAXTHEADGROUP                          Declared Thread Group Count %0 (X*Y*Z) is beyond the valid maximum of %1
+SM.MESHPSIGROWCOUNT                       For shader '%0', primitive output signatures are taking up more than %1 rows
+SM.MESHSHADERINOUTSIZE                    For shader '%0', input plus output size is greater than %1
+SM.MESHSHADERMAXPRIMITIVECOUNT            MS max primitive output count must be [0..%0].  %1 specified
+SM.MESHSHADERMAXVERTEXCOUNT               MS max vertex output count must be [0..%0].  %1 specified
+SM.MESHSHADEROUTPUTSIZE                   For shader '%0', vertex plus primitive output size is greater than %1
+SM.MESHSHADERPAYLOADSIZE                  For shader '%0', payload size is greater than %1
+SM.MESHSHADERPAYLOADSIZEDECLARED          For shader '%0', payload size %1 is greater than declared size of %2 bytes
+SM.MESHTOTALSIGROWCOUNT                   For shader '%0', vertex and primitive output signatures are taking up more than %1 rows
+SM.MESHVSIGROWCOUNT                       For shader '%0', vertex output signatures are taking up more than %1 rows
+SM.MULTISTREAMMUSTBEPOINT                 When multiple GS output streams are used they must be pointlists
+SM.NAME                                   Target shader model name must be known
+SM.NOINTERPMODE                           Interpolation mode must be undefined for VS input/PS output/patch constant.
+SM.NOPSOUTPUTIDX                          Pixel shader output registers are not indexable.
+SM.OPCODE                                 Opcode must be defined in target shader model
+SM.OPCODEININVALIDFUNCTION                Invalid DXIL opcode usage like StorePatchConstant in patch constant function
+SM.OPERAND                                Operand must be defined in target shader model
+SM.OUTPUTCONTROLPOINTCOUNTRANGE           output control point count must be [0..%0].  %1 specified
+SM.OUTPUTCONTROLPOINTSTOTALSCALARS        Total number of scalars across all HS output control points must not exceed
+SM.PATCHCONSTANTONLYFORHSDS               patch constant signature only valid in HS and DS
+SM.PSCONSISTENTINTERP                     Interpolation mode for PS input position must be linear_noperspective_centroid or linear_noperspective_sample when outputting oDepthGE or oDepthLE and not running at sample frequency (which is forced by inputting SV_SampleIndex or declaring an input linear_sample or linear_noperspective_sample)
+SM.PSCOVERAGEANDINNERCOVERAGE             InnerCoverage and Coverage are mutually exclusive.
+SM.PSMULTIPLEDEPTHSEMANTIC                Pixel Shader only allows one type of depth semantic to be declared
+SM.PSOUTPUTSEMANTIC                       Pixel Shader allows output semantics to be SV_Target, SV_Depth, SV_DepthGreaterEqual, SV_DepthLessEqual, SV_Coverage or SV_StencilRef, %0 found
+SM.PSTARGETCOL0                           SV_Target packed location must start at column 0
+SM.PSTARGETINDEXMATCHESROW                SV_Target semantic index must match packed row location
+SM.RAYSHADERPAYLOADSIZE                   For shader '%0', %1 size is smaller than argument's allocation size
+SM.RAYSHADERSIGNATURES                    Ray tracing shader '%0' should not have any shader signatures
+SM.RESOURCERANGEOVERLAP                   Resource ranges must not overlap
+SM.ROVONLYINPS                            RasterizerOrdered objects are only allowed in 5.0+ pixel shaders
+SM.SAMPLECOUNTONLYON2DMS                  Only Texture2DMS/2DMSArray could has sample count
+SM.SEMANTIC                               Semantic must be defined in target shader model
+SM.STREAMINDEXRANGE                       Stream index (%0) must between 0 and %1
+SM.TESSFACTORFORDOMAIN                    Required TessFactor for domain not found declared anywhere in Patch Constant data
+SM.TESSFACTORSIZEMATCHDOMAIN              TessFactor rows, columns (%0, %1) invalid for domain %2.  Expected %3 rows and 1 column.
+SM.THREADGROUPCHANNELRANGE                Declared Thread Group %0 size %1 outside valid range [%2..%3]
+SM.TRIOUTPUTPRIMITIVEMISMATCH             Hull Shader declared with Tri Domain must specify output primitive point, triangle_cw or triangle_ccw. Line output is not compatible with the Tri domain
+SM.UNDEFINEDOUTPUT                        Not all elements of output %0 were written
+SM.VALIDDOMAIN                            Invalid Tessellator Domain specified. Must be isoline, tri or quad
+SM.VIEWIDNEEDSSLOT                        ViewID requires compatible space in pixel shader input signature
+SM.ZEROHSINPUTCONTROLPOINTWITHINPUT       When HS input control point count is 0, no input signature should exist
+TYPES.DEFINED                             Type must be defined based on DXIL primitives
+TYPES.I8                                  I8 can only used as immediate value for intrinsic
+TYPES.INTWIDTH                            Int type must be of valid width
+TYPES.NOMULTIDIM                          Only one dimension allowed for array type
+TYPES.NOVECTOR                            Vector types must not be present
+UNI.NOWAVESENSITIVEGRADIENT               Gradient operations are not affected by wave-sensitive data or control flow.
+========================================= =======================================================================================================================================================================================================================================================================================================
 
 .. VALRULES-RST:END
 

+ 8 - 4
include/dxc/DXIL/DxilConstants.h

@@ -482,6 +482,10 @@ namespace DXIL {
     InnerCoverage = 92, // returns underestimated coverage input from conservative rasterization in a pixel shader
     SampleIndex = 90, // returns the sample index in a sample-frequency pixel shader
   
+    // Quad Wave Ops
+    QuadOp = 123, // returns the result of a quad-level operation
+    QuadReadLaneAt = 122, // reads from a lane in the quad
+  
     // Quaternary
     Bfi = 53, // Given a bit range from the LSB of a number, places that number of bits in another number at any offset
   
@@ -618,8 +622,6 @@ namespace DXIL {
     FirstbitHi = 33, // Returns the location of the first set bit starting from the highest order bit and working downward.
   
     // Wave
-    QuadOp = 123, // returns the result of a quad-level operation
-    QuadReadLaneAt = 122, // reads from a lane in the quad
     WaveActiveAllEqual = 115, // returns 1 if all the lanes have the same value
     WaveActiveBallot = 116, // returns a struct with a bit set for each lane where the condition is true
     WaveActiveBit = 120, // returns the result of the operation across all lanes
@@ -770,6 +772,10 @@ namespace DXIL {
     SampleIndex,
     Unary,
   
+    // Quad Wave Ops
+    QuadOp,
+    QuadReadLaneAt,
+  
     // Quaternary
     Quaternary,
   
@@ -865,8 +871,6 @@ namespace DXIL {
     UnaryBits,
   
     // Wave
-    QuadOp,
-    QuadReadLaneAt,
     WaveActiveAllEqual,
     WaveActiveBallot,
     WaveActiveBit,

+ 2 - 4
include/dxc/DXIL/DxilFunctionProps.h

@@ -73,14 +73,12 @@ struct DxilFunctionProps {
       unsigned maxVertexCount;
       unsigned maxPrimitiveCount;
       DXIL::MeshOutputTopology outputTopology;
-      // The following doesn't go into metadata
-      unsigned payloadByteSize;
+      unsigned payloadSizeInBytes;
     } MS;
     // Amplification shader.
     struct {
       unsigned numThreads[3];
-      // The following doesn't go into metadata
-      unsigned payloadByteSize;
+      unsigned payloadSizeInBytes;
     } AS;
   } ShaderProps;
   DXIL::ShaderKind shaderKind;

+ 10 - 6
include/dxc/DXIL/DxilMetadataHelper.h

@@ -256,15 +256,17 @@ public:
   static const unsigned kDxilHSStateMaxTessellationFactor     = 6;
 
   // MSState.
-  static const unsigned kDxilMSStateNumFields = 4;
+  static const unsigned kDxilMSStateNumFields = 5;
   static const unsigned kDxilMSStateNumThreads = 0;
   static const unsigned kDxilMSStateMaxVertexCount = 1;
   static const unsigned kDxilMSStateMaxPrimitiveCount = 2;
   static const unsigned kDxilMSStateOutputTopology = 3;
+  static const unsigned kDxilMSStatePayloadSizeInBytes = 4;
 
   // ASState.
-  static const unsigned kDxilASStateNumFields = 1;
+  static const unsigned kDxilASStateNumFields = 2;
   static const unsigned kDxilASStateNumThreads = 0;
+  static const unsigned kDxilASStatePayloadSizeInBytes = 1;
 
 public:
   /// Use this class to manipulate metadata of DXIL or high-level DX IR specific fields in the record.
@@ -434,15 +436,17 @@ private:
   llvm::MDTuple *EmitDxilMSState(const unsigned *NumThreads,
                                  unsigned MaxVertexCount,
                                  unsigned MaxPrimitiveCount,
-                                 DXIL::MeshOutputTopology OutputTopology);
+                                 DXIL::MeshOutputTopology OutputTopology,
+                                 unsigned payloadSizeInBytes);
   void LoadDxilMSState(const llvm::MDOperand &MDO,
                        unsigned *NumThreads,
                        unsigned &MaxVertexCount,
                        unsigned &MaxPrimitiveCount,
-                       DXIL::MeshOutputTopology &OutputTopology);
+                       DXIL::MeshOutputTopology &OutputTopology,
+                       unsigned &payloadSizeInBytes);
 
-  llvm::MDTuple *EmitDxilASState(const unsigned *NumThreads);
-  void LoadDxilASState(const llvm::MDOperand &MDO, unsigned *NumThreads);
+  llvm::MDTuple *EmitDxilASState(const unsigned *NumThreads, unsigned payloadSizeInBytes);
+  void LoadDxilASState(const llvm::MDOperand &MDO, unsigned *NumThreads, unsigned &payloadSizeInBytes);
 public:
   // Utility functions.
   static bool IsKnownNamedMetaData(const llvm::NamedMDNode &Node);

+ 2 - 2
include/dxc/DXIL/DxilModule.h

@@ -281,8 +281,8 @@ public:
   void SetMaxOutputPrimitives(unsigned NumOPs);
   DXIL::MeshOutputTopology GetMeshOutputTopology() const;
   void SetMeshOutputTopology(DXIL::MeshOutputTopology MeshOutputTopology);
-  unsigned GetPayloadByteSize() const;
-  void SetPayloadByteSize(unsigned Size);
+  unsigned GetPayloadSizeInBytes() const;
+  void SetPayloadSizeInBytes(unsigned Size);
 
   // AutoBindingSpace also enables automatic binding for libraries if set.
   // UINT_MAX == unset

+ 4 - 0
include/dxc/DXIL/DxilOperations.h

@@ -20,6 +20,7 @@ class Function;
 class Constant;
 class Value;
 class Instruction;
+class CallInst;
 }
 #include "llvm/IR/Attributes.h"
 #include "llvm/ADT/StringRef.h"
@@ -106,6 +107,9 @@ public:
   static void GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
                                        unsigned &major, unsigned &minor,
                                        unsigned &mask);
+  static void GetMinShaderModelAndMask(const llvm::CallInst *CI, bool bWithTranslation,
+                                       unsigned &major, unsigned &minor,
+                                       unsigned &mask);
 
 private:
   // Per-module properties.

+ 3 - 1
include/dxc/HLSL/DxilValidation.h

@@ -60,7 +60,7 @@ enum class ValidationRule : unsigned {
   // Instruction
   InstrAllowed, // Instructions must be of an allowed type
   InstrAttributeAtVertexNoInterpolation, // Attribute %0 must have nointerpolation mode in order to use GetAttributeAtVertex function.
-  InstrBarrierModeForNonCS, // sync in a non-Compute Shader must only sync UAV (sync_uglobal)
+  InstrBarrierModeForNonCS, // sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal)
   InstrBarrierModeNoMemory, // sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory).  Only _t (thread group sync) is optional. 
   InstrBarrierModeUselessUGroup, // sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.
   InstrBufferUpdateCounterOnResHasCounter, // BufferUpdateCounter valid only when HasCounter is true
@@ -197,6 +197,7 @@ enum class ValidationRule : unsigned {
   // Shader model
   Sm64bitRawBufferLoadStore, // i64/f64 rawBufferLoad/Store overloads are allowed after SM 6.3
   SmAmplificationShaderPayloadSize, // For shader '%0', payload size is greater than %1
+  SmAmplificationShaderPayloadSizeDeclared, // For shader '%0', payload size %1 is greater than declared size of %2 bytes
   SmAppendAndConsumeOnSameUAV, // BufferUpdateCounter inc and dec on a given UAV (%d) cannot both be in the same shader for shader model less than 5.1.
   SmCBufferArrayOffsetAlignment, // CBuffer array offset must be aligned to 16-bytes
   SmCBufferElementOverflow, // CBuffer elements must not overflow
@@ -231,6 +232,7 @@ enum class ValidationRule : unsigned {
   SmMeshShaderMaxVertexCount, // MS max vertex output count must be [0..%0].  %1 specified
   SmMeshShaderOutputSize, // For shader '%0', vertex plus primitive output size is greater than %1
   SmMeshShaderPayloadSize, // For shader '%0', payload size is greater than %1
+  SmMeshShaderPayloadSizeDeclared, // For shader '%0', payload size %1 is greater than declared size of %2 bytes
   SmMeshTotalSigRowCount, // For shader '%0', vertex and primitive output signatures are taking up more than %1 rows
   SmMeshVSigRowCount, // For shader '%0', vertex output signatures are taking up more than %1 rows
   SmMultiStreamMustBePoint, // When multiple GS output streams are used they must be pointlists

+ 23 - 10
lib/DXIL/DxilMetadataHelper.cpp

@@ -1108,6 +1108,8 @@ const Function *DxilMDHelper::LoadDxilFunctionProps(const MDTuple *pProps,
       ConstMDToUint32(pProps->getOperand(idx++));
     props->ShaderProps.MS.outputTopology =
       (DXIL::MeshOutputTopology)ConstMDToUint32(pProps->getOperand(idx++));
+    props->ShaderProps.MS.payloadSizeInBytes =
+      ConstMDToUint32(pProps->getOperand(idx++));
     break;
   case DXIL::ShaderKind::Amplification:
     props->ShaderProps.AS.numThreads[0] =
@@ -1116,6 +1118,8 @@ const Function *DxilMDHelper::LoadDxilFunctionProps(const MDTuple *pProps,
       ConstMDToUint32(pProps->getOperand(idx++));
     props->ShaderProps.AS.numThreads[2] =
       ConstMDToUint32(pProps->getOperand(idx++));
+    props->ShaderProps.AS.payloadSizeInBytes =
+      ConstMDToUint32(pProps->getOperand(idx++));
     break;
   default:
     break;
@@ -1222,13 +1226,14 @@ MDTuple *DxilMDHelper::EmitDxilEntryProperties(uint64_t rawShaderFlag,
     MDTuple *pMDTuple = EmitDxilMSState(MS.numThreads,
                                         MS.maxVertexCount,
                                         MS.maxPrimitiveCount,
-                                        MS.outputTopology);
+                                        MS.outputTopology,
+                                        MS.payloadSizeInBytes);
     MDVals.emplace_back(pMDTuple);
   } break;
   case DXIL::ShaderKind::Amplification: {
     auto &AS = props.ShaderProps.AS;
     MDVals.emplace_back(Uint32ToConstMD(DxilMDHelper::kDxilASStateTag));
-    MDTuple *pMDTuple = EmitDxilASState(AS.numThreads);
+    MDTuple *pMDTuple = EmitDxilASState(AS.numThreads, AS.payloadSizeInBytes);
     MDVals.emplace_back(pMDTuple);
   } break;
   default:
@@ -1351,12 +1356,13 @@ void DxilMDHelper::LoadDxilEntryProperties(const MDOperand &MDO,
       DXASSERT(props.IsMS(), "else invalid shader kind");
       auto &MS = props.ShaderProps.MS;
       LoadDxilMSState(MDO, MS.numThreads, MS.maxVertexCount,
-                      MS.maxPrimitiveCount, MS.outputTopology);
+                      MS.maxPrimitiveCount, MS.outputTopology,
+                      MS.payloadSizeInBytes);
     } break;
     case DxilMDHelper::kDxilASStateTag: {
       DXASSERT(props.IsAS(), "else invalid shader kind");
       auto &AS = props.ShaderProps.AS;
-      LoadDxilASState(MDO, AS.numThreads);
+      LoadDxilASState(MDO, AS.numThreads, AS.payloadSizeInBytes);
     } break;
     default:
       DXASSERT(false, "Unknown extended shader properties tag");
@@ -1432,13 +1438,14 @@ DxilMDHelper::EmitDxilFunctionProps(const hlsl::DxilFunctionProps *props,
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.MS.numThreads[2]);
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.MS.maxVertexCount);
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.MS.maxPrimitiveCount);
-    MDVals[valIdx++] =
-        Uint8ToConstMD((uint8_t)props->ShaderProps.MS.outputTopology);
+    MDVals[valIdx++] = Uint8ToConstMD((uint8_t)props->ShaderProps.MS.outputTopology);
+    MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.MS.payloadSizeInBytes);
     break;
   case DXIL::ShaderKind::Amplification:
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.AS.numThreads[0]);
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.AS.numThreads[1]);
     MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.AS.numThreads[2]);
+    MDVals[valIdx++] = Uint32ToConstMD(props->ShaderProps.AS.payloadSizeInBytes);
     break;
   default:
     break;
@@ -1923,7 +1930,8 @@ void DxilMDHelper::LoadDxilHSState(const MDOperand &MDO,
 MDTuple *DxilMDHelper::EmitDxilMSState(const unsigned *NumThreads,
                                        unsigned MaxVertexCount,
                                        unsigned MaxPrimitiveCount,
-                                       DXIL::MeshOutputTopology OutputTopology) {
+                                       DXIL::MeshOutputTopology OutputTopology,
+                                       unsigned payloadSizeInBytes) {
   Metadata *MDVals[kDxilMSStateNumFields];
   vector<Metadata *> NumThreadVals;
 
@@ -1934,6 +1942,7 @@ MDTuple *DxilMDHelper::EmitDxilMSState(const unsigned *NumThreads,
   MDVals[kDxilMSStateMaxVertexCount] = Uint32ToConstMD(MaxVertexCount);
   MDVals[kDxilMSStateMaxPrimitiveCount] = Uint32ToConstMD(MaxPrimitiveCount);
   MDVals[kDxilMSStateOutputTopology] = Uint32ToConstMD((unsigned)OutputTopology);
+  MDVals[kDxilMSStatePayloadSizeInBytes] = Uint32ToConstMD(payloadSizeInBytes);
 
   return MDNode::get(m_Ctx, MDVals);
 }
@@ -1942,7 +1951,8 @@ void DxilMDHelper::LoadDxilMSState(const MDOperand &MDO,
                                    unsigned *NumThreads,
                                    unsigned &MaxVertexCount,
                                    unsigned &MaxPrimitiveCount,
-                                   DXIL::MeshOutputTopology &OutputTopology) {
+                                   DXIL::MeshOutputTopology &OutputTopology,
+                                   unsigned &payloadSizeInBytes) {
   IFTBOOL(MDO.get() != nullptr, DXC_E_INCORRECT_DXIL_METADATA);
   const MDTuple *pTupleMD = dyn_cast<MDTuple>(MDO.get());
   IFTBOOL(pTupleMD != nullptr, DXC_E_INCORRECT_DXIL_METADATA);
@@ -1955,9 +1965,10 @@ void DxilMDHelper::LoadDxilMSState(const MDOperand &MDO,
   MaxVertexCount = ConstMDToUint32(pTupleMD->getOperand(kDxilMSStateMaxVertexCount));
   MaxPrimitiveCount = ConstMDToUint32(pTupleMD->getOperand(kDxilMSStateMaxPrimitiveCount));
   OutputTopology = (DXIL::MeshOutputTopology)ConstMDToUint32(pTupleMD->getOperand(kDxilMSStateOutputTopology));
+  payloadSizeInBytes = ConstMDToUint32(pTupleMD->getOperand(kDxilMSStatePayloadSizeInBytes));
 }
 
-MDTuple *DxilMDHelper::EmitDxilASState(const unsigned *NumThreads) {
+MDTuple *DxilMDHelper::EmitDxilASState(const unsigned *NumThreads, unsigned payloadSizeInBytes) {
   Metadata *MDVals[kDxilASStateNumFields];
   vector<Metadata *> NumThreadVals;
 
@@ -1965,11 +1976,12 @@ MDTuple *DxilMDHelper::EmitDxilASState(const unsigned *NumThreads) {
   NumThreadVals.emplace_back(Uint32ToConstMD(NumThreads[1]));
   NumThreadVals.emplace_back(Uint32ToConstMD(NumThreads[2]));
   MDVals[kDxilASStateNumThreads] = MDNode::get(m_Ctx, NumThreadVals);
+  MDVals[kDxilASStatePayloadSizeInBytes] = Uint32ToConstMD(payloadSizeInBytes);
 
   return MDNode::get(m_Ctx, MDVals);
 }
 
-void DxilMDHelper::LoadDxilASState(const MDOperand &MDO, unsigned *NumThreads) {
+void DxilMDHelper::LoadDxilASState(const MDOperand &MDO, unsigned *NumThreads, unsigned &payloadSizeInBytes) {
   IFTBOOL(MDO.get() != nullptr, DXC_E_INCORRECT_DXIL_METADATA);
   const MDTuple *pTupleMD = dyn_cast<MDTuple>(MDO.get());
   IFTBOOL(pTupleMD != nullptr, DXC_E_INCORRECT_DXIL_METADATA);
@@ -1979,6 +1991,7 @@ void DxilMDHelper::LoadDxilASState(const MDOperand &MDO, unsigned *NumThreads) {
   NumThreads[0] = ConstMDToUint32(pNode->getOperand(0));
   NumThreads[1] = ConstMDToUint32(pNode->getOperand(1));
   NumThreads[2] = ConstMDToUint32(pNode->getOperand(2));
+  payloadSizeInBytes = ConstMDToUint32(pTupleMD->getOperand(kDxilASStatePayloadSizeInBytes));
 }
 
 //

+ 28 - 6
lib/DXIL/DxilModule.cpp

@@ -17,6 +17,7 @@
 #include "dxc/Support/WinAdapter.h"
 #include "dxc/DXIL/DxilEntryProps.h"
 #include "dxc/DXIL/DxilSubobject.h"
+#include "dxc/DXIL/DxilInstructions.h"
 
 #include "llvm/IR/Constants.h"
 #include "llvm/IR/Function.h"
@@ -335,6 +336,27 @@ void DxilModule::CollectShaderFlagsForModule(ShaderFlags &Flags) {
 
 void DxilModule::CollectShaderFlagsForModule() {
   CollectShaderFlagsForModule(m_ShaderFlags);
+
+  // This is also where we record the size of the mesh payload for amplification shader output
+  for (Function &F : GetModule()->functions()) {
+    if (HasDxilEntryProps(&F)) {
+      DxilFunctionProps &props = GetDxilFunctionProps(&F);
+      if (props.shaderKind == DXIL::ShaderKind::Amplification) {
+        if (props.ShaderProps.AS.payloadSizeInBytes != 0)
+          continue;
+        for (const BasicBlock &BB : F.getBasicBlockList()) {
+          for (const Instruction &I : BB.getInstList()) {
+            const DxilInst_DispatchMesh dispatch(const_cast<Instruction*>(&I));
+            if (dispatch) {
+              Type *payloadTy = dispatch.get_payload()->getType();
+              const DataLayout &DL = m_pModule->getDataLayout();
+              props.ShaderProps.AS.payloadSizeInBytes = DL.getTypeAllocSize(payloadTy);
+            }
+          }
+        }
+      }
+    }
+  }
 }
 
 void DxilModule::SetNumThreads(unsigned x, unsigned y, unsigned z) {
@@ -682,20 +704,20 @@ void DxilModule::SetMeshOutputTopology(DXIL::MeshOutputTopology MeshOutputTopolo
   props.ShaderProps.MS.outputTopology = MeshOutputTopology;
 }
 
-unsigned DxilModule::GetPayloadByteSize() const {
+unsigned DxilModule::GetPayloadSizeInBytes() const {
   if (m_pSM->IsMS())
   {
     DXASSERT(m_DxilEntryPropsMap.size() == 1, "should have one entry prop");
     DxilFunctionProps &props = m_DxilEntryPropsMap.begin()->second->props;
     DXASSERT(props.IsMS(), "Must be MS profile");
-    return props.ShaderProps.MS.payloadByteSize;      
+    return props.ShaderProps.MS.payloadSizeInBytes;
   }
   else if(m_pSM->IsAS())
   {
     DXASSERT(m_DxilEntryPropsMap.size() == 1, "should have one entry prop");
     DxilFunctionProps &props = m_DxilEntryPropsMap.begin()->second->props;
     DXASSERT(props.IsAS(), "Must be AS profile");
-    return props.ShaderProps.AS.payloadByteSize;
+    return props.ShaderProps.AS.payloadSizeInBytes;
   }
   else
   {
@@ -703,20 +725,20 @@ unsigned DxilModule::GetPayloadByteSize() const {
   }
 }
 
-void DxilModule::SetPayloadByteSize(unsigned Size) {
+void DxilModule::SetPayloadSizeInBytes(unsigned Size) {
   DXASSERT(m_DxilEntryPropsMap.size() == 1 && (m_pSM->IsMS() || m_pSM->IsAS()),
            "only works for MS or AS profile");
   if (m_pSM->IsMS())
   {
     DxilFunctionProps &props = m_DxilEntryPropsMap.begin()->second->props;
     DXASSERT(props.IsMS(), "Must be MS profile");
-    props.ShaderProps.MS.payloadByteSize = Size;
+    props.ShaderProps.MS.payloadSizeInBytes = Size;
   } 
   else if (m_pSM->IsAS())
   {
     DxilFunctionProps &props = m_DxilEntryPropsMap.begin()->second->props;
     DXASSERT(props.IsAS(), "Must be AS profile");
-    props.ShaderProps.AS.payloadByteSize = Size;
+    props.ShaderProps.AS.payloadSizeInBytes = Size;
   }
 }
 

+ 53 - 8
lib/DXIL/DxilOperations.cpp

@@ -12,6 +12,7 @@
 #include "dxc/DXIL/DxilOperations.h"
 #include "dxc/Support/Global.h"
 #include "dxc/DXIL/DxilModule.h"
+#include "dxc/DXIL/DxilInstructions.h"
 
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -233,6 +234,8 @@ const OP::OpCodeProperty OP::m_OpCodeProps[(unsigned)OP::OpCode::NumOpCodes] = {
   {  OC::WaveActiveOp,            "WaveActiveOp",             OCC::WaveActiveOp,             "waveActiveOp",              { false,  true,  true,  true,  true,  true,  true,  true,  true, false, false}, Attribute::None,     },
   {  OC::WaveActiveBit,           "WaveActiveBit",            OCC::WaveActiveBit,            "waveActiveBit",             { false, false, false, false, false,  true,  true,  true,  true, false, false}, Attribute::None,     },
   {  OC::WavePrefixOp,            "WavePrefixOp",             OCC::WavePrefixOp,             "wavePrefixOp",              { false,  true,  true,  true, false,  true,  true,  true,  true, false, false}, Attribute::None,     },
+
+  // Quad Wave Ops                                                                                                           void,     h,     f,     d,    i1,    i8,   i16,   i32,   i64,   udt,   obj ,  function attribute
   {  OC::QuadReadLaneAt,          "QuadReadLaneAt",           OCC::QuadReadLaneAt,           "quadReadLaneAt",            { false,  true,  true,  true,  true,  true,  true,  true,  true, false, false}, Attribute::None,     },
   {  OC::QuadOp,                  "QuadOp",                   OCC::QuadOp,                   "quadOp",                    { false,  true,  true,  true, false,  true,  true,  true,  true, false, false}, Attribute::None,     },
 
@@ -581,6 +584,7 @@ bool OP::IsDxilOpGradient(OpCode C) {
   // OPCODE-GRADIENT:END
 }
 
+#define SFLAG(stage) ((unsigned)1 << (unsigned)DXIL::ShaderKind::stage)
 void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
                                   unsigned &major, unsigned &minor,
                                   unsigned &mask) {
@@ -588,7 +592,6 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
   // Default is 6.0, all stages
   major = 6;  minor = 0;
   mask = ((unsigned)1 << (unsigned)DXIL::ShaderKind::Invalid) - 1;
-#define SFLAG(stage) ((unsigned)1 << (unsigned)DXIL::ShaderKind::stage)
   /* <py::lines('OPCODE-SMMASK')>hctdb_instrhelp.get_min_sm_and_mask_text()</py>*/
   // OPCODE-SMMASK:BEGIN
   // Instructions: ThreadId=93, GroupId=94, ThreadIdInGroup=95,
@@ -623,6 +626,20 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
     mask = SFLAG(Hull);
     return;
   }
+  // Instructions: QuadReadLaneAt=122, QuadOp=123
+  if ((122 <= op && op <= 123)) {
+    mask = SFLAG(Library) | SFLAG(Compute) | SFLAG(Amplification) | SFLAG(Mesh) | SFLAG(Pixel);
+    return;
+  }
+  // Instructions: WaveIsFirstLane=110, WaveGetLaneIndex=111,
+  // WaveGetLaneCount=112, WaveAnyTrue=113, WaveAllTrue=114,
+  // WaveActiveAllEqual=115, WaveActiveBallot=116, WaveReadLaneAt=117,
+  // WaveReadLaneFirst=118, WaveActiveOp=119, WaveActiveBit=120,
+  // WavePrefixOp=121, WaveAllBitCount=135, WavePrefixBitCount=136
+  if ((110 <= op && op <= 121) || (135 <= op && op <= 136)) {
+    mask = SFLAG(Library) | SFLAG(Compute) | SFLAG(Amplification) | SFLAG(Mesh) | SFLAG(Pixel) | SFLAG(Vertex) | SFLAG(Hull) | SFLAG(Domain) | SFLAG(Geometry);
+    return;
+  }
   // Instructions: Sample=60, SampleBias=61, SampleCmp=64, CalculateLOD=81,
   // DerivCoarseX=83, DerivCoarseY=84, DerivFineX=85, DerivFineY=86
   if ((60 <= op && op <= 61) || op == 64 || op == 81 || (83 <= op && op <= 86)) {
@@ -717,11 +734,9 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
     major = 6;  minor = 4;
     return;
   }
-  // Instructions: WaveMatch=165, WaveMultiPrefixOp=166,
-  // WaveMultiPrefixBitCount=167, WriteSamplerFeedbackLevel=176,
-  // WriteSamplerFeedbackGrad=177, AllocateRayQuery=178,
-  // RayQuery_TraceRayInline=179, RayQuery_Proceed=180, RayQuery_Abort=181,
-  // RayQuery_CommitNonOpaqueTriangleHit=182,
+  // Instructions: WriteSamplerFeedbackLevel=176, WriteSamplerFeedbackGrad=177,
+  // AllocateRayQuery=178, RayQuery_TraceRayInline=179, RayQuery_Proceed=180,
+  // RayQuery_Abort=181, RayQuery_CommitNonOpaqueTriangleHit=182,
   // RayQuery_CommitProceduralPrimitiveHit=183, RayQuery_CommittedStatus=184,
   // RayQuery_CandidateType=185, RayQuery_CandidateObjectToWorld3x4=186,
   // RayQuery_CandidateWorldToObject3x4=187,
@@ -742,7 +757,7 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
   // RayQuery_CommittedGeometryIndex=209, RayQuery_CommittedPrimitiveIndex=210,
   // RayQuery_CommittedObjectRayOrigin=211,
   // RayQuery_CommittedObjectRayDirection=212
-  if ((165 <= op && op <= 167) || (176 <= op && op <= 212)) {
+  if ((176 <= op && op <= 212)) {
     major = 6;  minor = 5;
     return;
   }
@@ -752,6 +767,13 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
     mask = SFLAG(Amplification);
     return;
   }
+  // Instructions: WaveMatch=165, WaveMultiPrefixOp=166,
+  // WaveMultiPrefixBitCount=167
+  if ((165 <= op && op <= 167)) {
+    major = 6;  minor = 5;
+    mask = SFLAG(Library) | SFLAG(Compute) | SFLAG(Amplification) | SFLAG(Mesh) | SFLAG(Pixel) | SFLAG(Vertex) | SFLAG(Hull) | SFLAG(Domain) | SFLAG(Geometry);
+    return;
+  }
   // Instructions: GeometryIndex=213
   if (op == 213) {
     major = 6;  minor = 5;
@@ -772,9 +794,30 @@ void OP::GetMinShaderModelAndMask(OpCode C, bool bWithTranslation,
     return;
   }
   // OPCODE-SMMASK:END
-#undef SFLAG
 }
 
+void OP::GetMinShaderModelAndMask(const llvm::CallInst *CI, bool bWithTranslation,
+                                  unsigned &major, unsigned &minor,
+                                  unsigned &mask) {
+  OpCode opcode = OP::GetDxilOpFuncCallInst(CI);
+  GetMinShaderModelAndMask(opcode, bWithTranslation, major, minor, mask);
+
+  // Additional rules are applied manually here.
+
+  // Barrier with mode != UAVFenceGlobal requires compute, amplification, or mesh
+  // Instructions: Barrier=80
+  if (opcode == DXIL::OpCode::Barrier) {
+    DxilInst_Barrier barrier(const_cast<CallInst*>(CI));
+    unsigned mode = barrier.get_barrierMode_val();
+    if (mode != (unsigned)DXIL::BarrierMode::UAVFenceGlobal) {
+      mask = SFLAG(Library) | SFLAG(Compute) | SFLAG(Amplification) | SFLAG(Mesh);
+    }
+    return;
+  }
+}
+#undef SFLAG
+
+
 static Type *GetOrCreateStructType(LLVMContext &Ctx, ArrayRef<Type*> types, StringRef Name, Module *pModule) {
   if (StructType *ST = pModule->getTypeByName(Name)) {
     // TODO: validate the exist type match types if needed.
@@ -1083,6 +1126,8 @@ Function *OP::GetOpFunc(OpCode opCode, Type *pOverloadType) {
   case OpCode::WaveActiveOp:           A(pETy);     A(pI32); A(pETy); A(pI8);  A(pI8);  break;
   case OpCode::WaveActiveBit:          A(pETy);     A(pI32); A(pETy); A(pI8);  break;
   case OpCode::WavePrefixOp:           A(pETy);     A(pI32); A(pETy); A(pI8);  A(pI8);  break;
+
+    // Quad Wave Ops
   case OpCode::QuadReadLaneAt:         A(pETy);     A(pI32); A(pETy); A(pI32); break;
   case OpCode::QuadOp:                 A(pETy);     A(pI32); A(pETy); A(pI8);  break;
 

+ 6 - 54
lib/DxilContainer/DxilContainerAssembler.cpp

@@ -623,58 +623,11 @@ public:
         }
       }
       pInfo->MS.GroupSharedBytesUsed = totalByteSize;
-
-      const Function *entryFunc = m_Module.GetEntryFunction();
-      unsigned payloadByteSize = 0;
-      for (auto b = entryFunc->begin(), bend = entryFunc->end(); b != bend; ++b) {
-        auto i = b->begin(), iend = b->end();
-        for (; i != iend; ++i) {
-          const Instruction &I = *i;
-
-          // Calls to external functions.
-          const CallInst *CI = dyn_cast<CallInst>(&I);
-          if (CI) {
-            if (hlsl::OP::IsDxilOpFuncCallInst(CI,DXIL::OpCode::GetMeshPayload)) {
-              PointerType *payloadPTy = cast<PointerType>(CI->getType());
-              Type *payloadTy = payloadPTy->getPointerElementType();
-              payloadByteSize = DL.getTypeAllocSize(payloadTy);
-              break;
-            }
-          }
-        }
-        if (i != iend)
-          break;
-      }
-      pInfo->MS.PayloadSizeInBytes = payloadByteSize;
+      pInfo->MS.PayloadSizeInBytes = m_Module.GetPayloadSizeInBytes();
       break;
     }
     case ShaderModel::Kind::Amplification: {
-      const Function *entryFunc = m_Module.GetEntryFunction();
-      unsigned payloadByteSize = 0;
-      Module *mod = m_Module.GetModule();
-      const DataLayout &DL = mod->getDataLayout();
-      for (auto b = entryFunc->begin(), bend = entryFunc->end(); b != bend;
-           ++b) {
-        auto i = b->begin(), iend = b->end();
-        for (; i != iend; ++i) {
-          const Instruction &I = *i;
-
-          // Calls to external functions.
-          const CallInst *CI = dyn_cast<CallInst>(&I);
-          if (CI) {
-            if (hlsl::OP::IsDxilOpFuncCallInst(CI,DXIL::OpCode::DispatchMesh)) {
-              DxilInst_DispatchMesh dispatchMeshCall(const_cast<CallInst*>(CI));
-              Value *operandVal = dispatchMeshCall.get_payload();
-              Type *payloadTy = operandVal->getType();
-              payloadByteSize = DL.getTypeAllocSize(payloadTy);
-              break;
-            }
-          }
-        }
-        if (i != iend)
-          break;
-      }
-      pInfo->AS.PayloadSizeInBytes = payloadByteSize;
+      pInfo->AS.PayloadSizeInBytes = m_Module.GetPayloadSizeInBytes();
       break;
     }
     }
@@ -1060,19 +1013,18 @@ private:
 
   void UpdateFunctionToShaderCompat(const llvm::Function* dxilFunc) {
     for (const auto &user : dxilFunc->users()) {
-      if (const llvm::Instruction *I = dyn_cast<const llvm::Instruction>(user)) {
+      if (const llvm::CallInst *CI = dyn_cast<const llvm::CallInst>(user)) {
         // Find calling function
-        const llvm::Function *F = cast<const llvm::Function>(I->getParent()->getParent());
+        const llvm::Function *F = cast<const llvm::Function>(CI->getParent()->getParent());
         // Insert or lookup info
         ShaderCompatInfo &info = m_FuncToShaderCompat[F];
-        OpCode opcode = OP::GetDxilOpFuncCallInst(I);
         unsigned major, minor, mask;
         // bWithTranslation = true for library modules
-        OP::GetMinShaderModelAndMask(opcode, /*bWithTranslation*/true, major, minor, mask);
+        OP::GetMinShaderModelAndMask(CI, /*bWithTranslation*/true, major, minor, mask);
         if (major > info.minMajor) {
           info.minMajor = major;
           info.minMinor = minor;
-        } else if (minor > info.minMinor) {
+        } else if (major == info.minMajor && minor > info.minMinor) {
           info.minMinor = minor;
         }
         info.mask &= mask;

+ 61 - 26
lib/HLSL/DxilValidation.cpp

@@ -155,7 +155,7 @@ const char *hlsl::GetValidationRuleText(ValidationRule value) {
     case hlsl::ValidationRule::InstrSampleCompType: return "sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.";
     case hlsl::ValidationRule::InstrBarrierModeUselessUGroup: return "sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.";
     case hlsl::ValidationRule::InstrBarrierModeNoMemory: return "sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory).  Only _t (thread group sync) is optional. ";
-    case hlsl::ValidationRule::InstrBarrierModeForNonCS: return "sync in a non-Compute Shader must only sync UAV (sync_uglobal)";
+    case hlsl::ValidationRule::InstrBarrierModeForNonCS: return "sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal)";
     case hlsl::ValidationRule::InstrWriteMaskForTypedUAVStore: return "store on typed uav must write to all four components of the UAV";
     case hlsl::ValidationRule::InstrResourceKindForCalcLOD: return "lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray";
     case hlsl::ValidationRule::InstrResourceKindForSample: return "sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray";
@@ -264,6 +264,7 @@ const char *hlsl::GetValidationRuleText(ValidationRule value) {
     case hlsl::ValidationRule::SmMeshShaderMaxVertexCount: return "MS max vertex output count must be [0..%0].  %1 specified";
     case hlsl::ValidationRule::SmMeshShaderMaxPrimitiveCount: return "MS max primitive output count must be [0..%0].  %1 specified";
     case hlsl::ValidationRule::SmMeshShaderPayloadSize: return "For shader '%0', payload size is greater than %1";
+    case hlsl::ValidationRule::SmMeshShaderPayloadSizeDeclared: return "For shader '%0', payload size %1 is greater than declared size of %2 bytes";
     case hlsl::ValidationRule::SmMeshShaderOutputSize: return "For shader '%0', vertex plus primitive output size is greater than %1";
     case hlsl::ValidationRule::SmMeshShaderInOutSize: return "For shader '%0', input plus output size is greater than %1";
     case hlsl::ValidationRule::SmMeshVSigRowCount: return "For shader '%0', vertex output signatures are taking up more than %1 rows";
@@ -271,6 +272,7 @@ const char *hlsl::GetValidationRuleText(ValidationRule value) {
     case hlsl::ValidationRule::SmMeshTotalSigRowCount: return "For shader '%0', vertex and primitive output signatures are taking up more than %1 rows";
     case hlsl::ValidationRule::SmMaxMSSMSize: return "Total Thread Group Shared Memory storage is %0, exceeded %1";
     case hlsl::ValidationRule::SmAmplificationShaderPayloadSize: return "For shader '%0', payload size is greater than %1";
+    case hlsl::ValidationRule::SmAmplificationShaderPayloadSizeDeclared: return "For shader '%0', payload size %1 is greater than declared size of %2 bytes";
     case hlsl::ValidationRule::UniNoWaveSensitiveGradient: return "Gradient operations are not affected by wave-sensitive data or control flow.";
     case hlsl::ValidationRule::FlowReducible: return "Execution flow must be reducible";
     case hlsl::ValidationRule::FlowNoRecusion: return "Recursion is not permitted";
@@ -816,6 +818,16 @@ static bool ValidateOpcodeInProfile(DXIL::OpCode opcode,
   // Instructions: StorePatchConstant=106, OutputControlPointID=107
   if ((106 <= op && op <= 107))
     return (SK == DXIL::ShaderKind::Hull);
+  // Instructions: QuadReadLaneAt=122, QuadOp=123
+  if ((122 <= op && op <= 123))
+    return (SK == DXIL::ShaderKind::Library || SK == DXIL::ShaderKind::Compute || SK == DXIL::ShaderKind::Amplification || SK == DXIL::ShaderKind::Mesh || SK == DXIL::ShaderKind::Pixel);
+  // Instructions: WaveIsFirstLane=110, WaveGetLaneIndex=111,
+  // WaveGetLaneCount=112, WaveAnyTrue=113, WaveAllTrue=114,
+  // WaveActiveAllEqual=115, WaveActiveBallot=116, WaveReadLaneAt=117,
+  // WaveReadLaneFirst=118, WaveActiveOp=119, WaveActiveBit=120,
+  // WavePrefixOp=121, WaveAllBitCount=135, WavePrefixBitCount=136
+  if ((110 <= op && op <= 121) || (135 <= op && op <= 136))
+    return (SK == DXIL::ShaderKind::Library || SK == DXIL::ShaderKind::Compute || SK == DXIL::ShaderKind::Amplification || SK == DXIL::ShaderKind::Mesh || SK == DXIL::ShaderKind::Pixel || SK == DXIL::ShaderKind::Vertex || SK == DXIL::ShaderKind::Hull || SK == DXIL::ShaderKind::Domain || SK == DXIL::ShaderKind::Geometry);
   // Instructions: Sample=60, SampleBias=61, SampleCmp=64, CalculateLOD=81,
   // DerivCoarseX=83, DerivCoarseY=84, DerivFineX=85, DerivFineY=86
   if ((60 <= op && op <= 61) || op == 64 || op == 81 || (83 <= op && op <= 86))
@@ -874,11 +886,9 @@ static bool ValidateOpcodeInProfile(DXIL::OpCode opcode,
   // Instructions: Dot2AddHalf=162, Dot4AddI8Packed=163, Dot4AddU8Packed=164
   if ((162 <= op && op <= 164))
     return (major > 6 || (major == 6 && minor >= 4));
-  // Instructions: WaveMatch=165, WaveMultiPrefixOp=166,
-  // WaveMultiPrefixBitCount=167, WriteSamplerFeedbackLevel=176,
-  // WriteSamplerFeedbackGrad=177, AllocateRayQuery=178,
-  // RayQuery_TraceRayInline=179, RayQuery_Proceed=180, RayQuery_Abort=181,
-  // RayQuery_CommitNonOpaqueTriangleHit=182,
+  // Instructions: WriteSamplerFeedbackLevel=176, WriteSamplerFeedbackGrad=177,
+  // AllocateRayQuery=178, RayQuery_TraceRayInline=179, RayQuery_Proceed=180,
+  // RayQuery_Abort=181, RayQuery_CommitNonOpaqueTriangleHit=182,
   // RayQuery_CommitProceduralPrimitiveHit=183, RayQuery_CommittedStatus=184,
   // RayQuery_CandidateType=185, RayQuery_CandidateObjectToWorld3x4=186,
   // RayQuery_CandidateWorldToObject3x4=187,
@@ -899,12 +909,17 @@ static bool ValidateOpcodeInProfile(DXIL::OpCode opcode,
   // RayQuery_CommittedGeometryIndex=209, RayQuery_CommittedPrimitiveIndex=210,
   // RayQuery_CommittedObjectRayOrigin=211,
   // RayQuery_CommittedObjectRayDirection=212
-  if ((165 <= op && op <= 167) || (176 <= op && op <= 212))
+  if ((176 <= op && op <= 212))
     return (major > 6 || (major == 6 && minor >= 5));
   // Instructions: DispatchMesh=173
   if (op == 173)
     return (major > 6 || (major == 6 && minor >= 5))
         && (SK == DXIL::ShaderKind::Amplification);
+  // Instructions: WaveMatch=165, WaveMultiPrefixOp=166,
+  // WaveMultiPrefixBitCount=167
+  if ((165 <= op && op <= 167))
+    return (major > 6 || (major == 6 && minor >= 5))
+        && (SK == DXIL::ShaderKind::Library || SK == DXIL::ShaderKind::Compute || SK == DXIL::ShaderKind::Amplification || SK == DXIL::ShaderKind::Mesh || SK == DXIL::ShaderKind::Pixel || SK == DXIL::ShaderKind::Vertex || SK == DXIL::ShaderKind::Hull || SK == DXIL::ShaderKind::Domain || SK == DXIL::ShaderKind::Geometry);
   // Instructions: GeometryIndex=213
   if (op == 213)
     return (major > 6 || (major == 6 && minor >= 5))
@@ -2332,6 +2347,22 @@ static void ValidateDxilOperationCallInProfile(CallInst *CI,
                                                DXIL::OpCode opcode,
                                                const ShaderModel *pSM,
                                                ValidationContext &ValCtx) {
+  DXIL::ShaderKind shaderKind = pSM ? pSM->GetKind() : DXIL::ShaderKind::Invalid;
+  llvm::Function *F = CI->getParent()->getParent();
+  if (DXIL::ShaderKind::Library == shaderKind) {
+    if (ValCtx.DxilMod.HasDxilFunctionProps(F))
+      shaderKind = ValCtx.DxilMod.GetDxilFunctionProps(F).shaderKind;
+    else if (ValCtx.DxilMod.IsPatchConstantShader(F))
+      shaderKind = DXIL::ShaderKind::Hull;
+  }
+
+  // These shader models are treted like compute
+  bool isCSLike = shaderKind == DXIL::ShaderKind::Compute ||
+                  shaderKind == DXIL::ShaderKind::Mesh ||
+                  shaderKind == DXIL::ShaderKind::Amplification;
+  // Is called from a library function
+  bool isLibFunc = shaderKind == DXIL::ShaderKind::Library;
+
   switch (opcode) {
   // Imm input value validation.
   case DXIL::OpCode::Asin:
@@ -2448,7 +2479,7 @@ static void ValidateDxilOperationCallInProfile(CallInst *CI,
         static_cast<unsigned>(DXIL::BarrierMode::UAVFenceThreadGroup);
     unsigned barrierMode = cMode->getLimitedValue();
 
-    if (ValCtx.DxilMod.GetShaderModel()->IsCS()) {
+    if (isCSLike || isLibFunc) {
       bool bHasUGlobal = barrierMode & uglobal;
       bool bHasGroup = barrierMode & g;
       bool bHasUGroup = barrierMode & ut;
@@ -2460,22 +2491,12 @@ static void ValidateDxilOperationCallInProfile(CallInst *CI,
       if (!bHasUGlobal && !bHasGroup && !bHasUGroup) {
         ValCtx.EmitInstrError(CI, ValidationRule::InstrBarrierModeNoMemory);
       }
-    } else if (!ValCtx.isLibProfile) {
+    } else {
       if (uglobal != barrierMode) {
         ValCtx.EmitInstrError(CI, ValidationRule::InstrBarrierModeForNonCS);
       }
     }
   } break;
-  case DXIL::OpCode::QuadOp:
-    if (!pSM->IsPS() && !ValCtx.isLibProfile)
-      ValCtx.EmitFormatError(ValidationRule::SmOpcodeInInvalidFunction,
-                             {"QuadReadAcross", "Pixel Shader"});
-    break;
-  case DXIL::OpCode::QuadReadLaneAt:
-    if (!pSM->IsPS() && !ValCtx.isLibProfile)
-      ValCtx.EmitFormatError(ValidationRule::SmOpcodeInInvalidFunction,
-                             {"QuadReadLaneAt", "Pixel Shader"});
-    break;
   case DXIL::OpCode::CreateHandleForLib:
     if (!ValCtx.isLibProfile) {
       ValCtx.EmitFormatError(ValidationRule::SmOpcodeInInvalidFunction,
@@ -2846,13 +2867,19 @@ static void ValidateMsIntrinsics(Function *F,
     const DataLayout &DL = F->getParent()->getDataLayout();
     unsigned payloadSize = DL.getTypeAllocSize(payloadTy);
 
-    if (payloadSize > DXIL::kMaxMSASPayloadSize) {
+    DxilFunctionProps &prop = ValCtx.DxilMod.GetDxilFunctionProps(F);
+
+    if (payloadSize > DXIL::kMaxMSASPayloadSize ||
+        prop.ShaderProps.MS.payloadSizeInBytes > DXIL::kMaxMSASPayloadSize) {
       ValCtx.EmitFormatError(ValidationRule::SmMeshShaderPayloadSize,
         { F->getName(), std::to_string(DXIL::kMaxMSASPayloadSize) });
     }
 
-    DxilFunctionProps &prop = ValCtx.DxilMod.GetDxilFunctionProps(F);
-    prop.ShaderProps.MS.payloadByteSize = payloadSize;
+    if (prop.ShaderProps.MS.payloadSizeInBytes < payloadSize) {
+      ValCtx.EmitFormatError(ValidationRule::SmMeshShaderPayloadSizeDeclared,
+        { F->getName(), std::to_string(payloadSize),
+          std::to_string(prop.ShaderProps.MS.payloadSizeInBytes) });
+    }
   }
 }
 
@@ -2869,14 +2896,20 @@ static void ValidateAsIntrinsics(Function *F, ValidationContext &ValCtx, CallIns
       const DataLayout &DL = F->getParent()->getDataLayout();
       unsigned payloadSize = DL.getTypeAllocSize(payloadTy);
 
-      if (payloadSize > DXIL::kMaxMSASPayloadSize) {
+      DxilFunctionProps &prop = ValCtx.DxilMod.GetDxilFunctionProps(F);
+
+      if (payloadSize > DXIL::kMaxMSASPayloadSize ||
+          prop.ShaderProps.AS.payloadSizeInBytes > DXIL::kMaxMSASPayloadSize) {
         ValCtx.EmitFormatError(
             ValidationRule::SmAmplificationShaderPayloadSize,
             {F->getName(), std::to_string(DXIL::kMaxMSASPayloadSize)});
       }
 
-      DxilFunctionProps &prop = ValCtx.DxilMod.GetDxilFunctionProps(F);
-      prop.ShaderProps.AS.payloadByteSize = payloadSize;
+      if (prop.ShaderProps.AS.payloadSizeInBytes < payloadSize) {
+        ValCtx.EmitFormatError(ValidationRule::SmAmplificationShaderPayloadSizeDeclared,
+          { F->getName(), std::to_string(payloadSize),
+            std::to_string(prop.ShaderProps.AS.payloadSizeInBytes) });
+      }
     }
 
   }
@@ -4931,7 +4964,7 @@ static void ValidateEntrySignatures(ValidationContext &ValCtx,
         { F.getName(), std::to_string(DXIL::kMaxMSOutputTotalScalars) });
     }
 
-    unsigned totalInputOutputScalars = totalOutputScalars + props.ShaderProps.MS.payloadByteSize;
+    unsigned totalInputOutputScalars = totalOutputScalars + props.ShaderProps.MS.payloadSizeInBytes;
     if (totalInputOutputScalars > DXIL::kMaxMSInputOutputTotalScalars) {
       ValCtx.EmitFormatError(
         ValidationRule::SmMeshShaderInOutSize,
@@ -5555,6 +5588,8 @@ void GetValidationVersion(_Out_ unsigned *pMajor, _Out_ unsigned *pMinor) {
   // 1.5 adds:
   // - WaveMatch, WaveMultiPrefixOp, WaveMultiPrefixBitCount
   // - HASH container part support
+  // - Mesh and Amplification shaders
+  // - DXR 1.1 & RayQuery support
   *pMajor = 1;
   *pMinor = 5;
 }

+ 3 - 0
tools/clang/lib/CodeGen/CGHLSLMS.cpp

@@ -1834,6 +1834,9 @@ void CGMSHLSLRuntime::AddHLSLFunctionInfo(Function *F, const FunctionDecl *FD) {
         continue;
       }
       dxilInputQ = DxilParamInputQual::InPayload;
+      DataLayout DL(&this->TheModule);
+      funcProps->ShaderProps.MS.payloadSizeInBytes = DL.getTypeAllocSize(
+        F->getFunctionType()->getFunctionParamType(ArgNo)->getPointerElementType());
       hasInPayload = true;
     }
 

+ 12 - 13
tools/clang/unittests/HLSL/ValidationTest.cpp

@@ -186,7 +186,7 @@ public:
   TEST_METHOD(InvalidSigCompTyFail)
   TEST_METHOD(MultiStream2Fail)
   TEST_METHOD(PhiTGSMFail)
-  TEST_METHOD(QuadOpInCS)
+  TEST_METHOD(QuadOpInVS)
   TEST_METHOD(ReducibleFail)
   TEST_METHOD(SampleBiasFail)
   TEST_METHOD(SamplerKindFail)
@@ -836,17 +836,16 @@ TEST_F(ValidationTest, PhiTGSMFail) {
       "TGSM pointers must originate from an unambiguous TGSM global variable");
 }
 
-TEST_F(ValidationTest, QuadOpInCS) {
-  if (m_ver.SkipDxilVersion(1, 3)) return;
+TEST_F(ValidationTest, QuadOpInVS) {
+  if (m_ver.SkipDxilVersion(1, 5)) return;
   RewriteAssemblyCheckMsg(
       "struct PerThreadData { int "
       "input; int output; }; RWStructuredBuffer<PerThreadData> g_sb; "
-      "[numthreads(8, 12, 1)] void main(uint GI : SV_GroupIndex) "
-      "{ PerThreadData pts = g_sb[GI]; pts.output = "
-      "WaveActiveSum(pts.input); g_sb[GI] = pts; }; ",
-      "cs_6_0", {"@dx.op.waveActiveOp.i32(i32 119", "declare i32 @dx.op.waveActiveOp.i32(i32, i32, i8, i8)"},
+      "void main(uint vid : SV_VertexID)"
+      "{ g_sb[vid].output = WaveActiveSum(g_sb[vid].input); }",
+      "vs_6_0", {"@dx.op.waveActiveOp.i32(i32 119", "declare i32 @dx.op.waveActiveOp.i32(i32, i32, i8, i8)"},
       {"@dx.op.quadOp.i32(i32 123", "declare i32 @dx.op.quadOp.i32(i32, i32, i8, i8)"},
-      "'QuadReadAcross' should only be used in 'Pixel Shader'"
+      "QuadOp not valid in shader model vs_6_0"
       );
 }
 
@@ -1015,7 +1014,7 @@ TEST_F(ValidationTest, UavBarrierFail) {
       {"uav load don't support offset",
        "uav load don't support mipLevel/sampleIndex",
        "store on typed uav must write to all four components of the UAV",
-       "sync in a non-Compute Shader must only sync UAV (sync_uglobal)"});
+       "sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal)"});
 }
 TEST_F(ValidationTest, UndefValueFail) {
   TestCheck(L"..\\CodeGenHLSL\\UndefValue.hlsl");
@@ -3575,16 +3574,16 @@ TEST_F(ValidationTest, MeshMultipleGetMeshPayload) {
 
 TEST_F(ValidationTest, MeshOutofRangeMaxVertexCount) {
   RewriteAssemblyCheckMsg(L"..\\CodeGenHLSL\\mesh-val\\mesh.hlsl", "ms_6_5",
-                          "= !{!([0-9]+), i32 32, i32 16, i32 2}",
-                          "= !{!\\1, i32 257, i32 16, i32 2}",
+                          "= !{!([0-9]+), i32 32, i32 16, i32 2, i32 40}",
+                          "= !{!\\1, i32 257, i32 16, i32 2, i32 40}",
                           "MS max vertex output count must be \\[0..256\\].  257 specified",
                           true);
 }
 
 TEST_F(ValidationTest, MeshOutofRangeMaxPrimitiveCount) {
   RewriteAssemblyCheckMsg(L"..\\CodeGenHLSL\\mesh-val\\mesh.hlsl", "ms_6_5",
-                          "= !{!([0-9]+), i32 32, i32 16, i32 2}",
-                          "= !{!\\1, i32 32, i32 257, i32 2}",
+                          "= !{!([0-9]+), i32 32, i32 16, i32 2, i32 40}",
+                          "= !{!\\1, i32 32, i32 257, i32 2, i32 40}",
                           "MS max primitive output count must be \\[0..256\\].  257 specified",
                           true);
 }

+ 9 - 2
utils/hct/hctdb.py

@@ -314,9 +314,14 @@ class db_dxil(object):
         for i in "LegacyF32ToF16,LegacyF16ToF32".split(","):
             self.name_idx[i].category = "Legacy floating-point"
         for i in self.instr:
-            if i.name.startswith("Wave") or i.name.startswith("Quad") or i.name == "GlobalOrderedCountInc":
+            if i.name.startswith("Wave"):
                 i.category = "Wave"
                 i.is_wave = True
+                i.shader_stages = ("library", "compute", "amplification", "mesh", "pixel", "vertex", "hull", "domain", "geometry")
+            elif i.name.startswith("Quad"):
+                i.category = "Quad Wave Ops"
+                i.is_wave = True
+                i.shader_stages = ("library", "compute", "amplification", "mesh", "pixel")
             elif i.name.startswith("Bitcast"):
                 i.category = "Bitcasts with different sizes"
         for i in "ViewID,AttributeAtVertex".split(","):
@@ -2320,7 +2325,7 @@ class db_dxil(object):
         self.add_valrule("Instr.SampleCompType", "sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.")
         self.add_valrule("Instr.BarrierModeUselessUGroup", "sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.")
         self.add_valrule("Instr.BarrierModeNoMemory", "sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory).  Only _t (thread group sync) is optional. ")
-        self.add_valrule("Instr.BarrierModeForNonCS", "sync in a non-Compute Shader must only sync UAV (sync_uglobal)")
+        self.add_valrule("Instr.BarrierModeForNonCS", "sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal)")
         self.add_valrule("Instr.WriteMaskForTypedUAVStore", "store on typed uav must write to all four components of the UAV")
         self.add_valrule("Instr.ResourceKindForCalcLOD","lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray")
         self.add_valrule("Instr.ResourceKindForSample", "sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray")
@@ -2438,6 +2443,7 @@ class db_dxil(object):
         self.add_valrule("Sm.MeshShaderMaxVertexCount", "MS max vertex output count must be [0..%0].  %1 specified")
         self.add_valrule("Sm.MeshShaderMaxPrimitiveCount", "MS max primitive output count must be [0..%0].  %1 specified")
         self.add_valrule("Sm.MeshShaderPayloadSize", "For shader '%0', payload size is greater than %1")
+        self.add_valrule("Sm.MeshShaderPayloadSizeDeclared", "For shader '%0', payload size %1 is greater than declared size of %2 bytes")
         self.add_valrule("Sm.MeshShaderOutputSize", "For shader '%0', vertex plus primitive output size is greater than %1")
         self.add_valrule("Sm.MeshShaderInOutSize", "For shader '%0', input plus output size is greater than %1")
         self.add_valrule("Sm.MeshVSigRowCount", "For shader '%0', vertex output signatures are taking up more than %1 rows")
@@ -2445,6 +2451,7 @@ class db_dxil(object):
         self.add_valrule("Sm.MeshTotalSigRowCount", "For shader '%0', vertex and primitive output signatures are taking up more than %1 rows")
         self.add_valrule("Sm.MaxMSSMSize", "Total Thread Group Shared Memory storage is %0, exceeded %1")
         self.add_valrule("Sm.AmplificationShaderPayloadSize", "For shader '%0', payload size is greater than %1")
+        self.add_valrule("Sm.AmplificationShaderPayloadSizeDeclared", "For shader '%0', payload size %1 is greater than declared size of %2 bytes")
 
         # fxc relaxed check of gradient check.
         #self.add_valrule("Uni.NoUniInDiv", "TODO - No instruction requiring uniform execution can be present in divergent block")