DXIL.rst 191 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735173617371738173917401741174217431744174517461747174817491750175117521753175417551756175717581759176017611762176317641765176617671768176917701771177217731774177517761777177817791780178117821783178417851786178717881789179017911792179317941795179617971798179918001801180218031804180518061807180818091810181118121813181418151816181718181819182018211822182318241825182618271828182918301831183218331834183518361837183818391840184118421843184418451846184718481849185018511852185318541855185618571858185918601861186218631864186518661867186818691870187118721873187418751876187718781879188018811882188318841885188618871888188918901891189218931894189518961897189818991900190119021903190419051906190719081909191019111912191319141915191619171918191919201921192219231924192519261927192819291930193119321933193419351936193719381939194019411942194319441945194619471948194919501951195219531954195519561957195819591960196119621963196419651966196719681969197019711972197319741975197619771978197919801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420252026202720282029203020312032203320342035203620372038203920402041204220432044204520462047204820492050205120522053205420552056205720582059206020612062206320642065206620672068206920702071207220732074207520762077207820792080208120822083208420852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116211721182119212021212122212321242125212621272128212921302131213221332134213521362137213821392140214121422143214421452146214721482149215021512152215321542155215621572158215921602161216221632164216521662167216821692170217121722173217421752176217721782179218021812182218321842185218621872188218921902191219221932194219521962197219821992200220122022203220422052206220722082209221022112212221322142215221622172218221922202221222222232224222522262227222822292230223122322233223422352236223722382239224022412242224322442245224622472248224922502251225222532254225522562257225822592260226122622263226422652266226722682269227022712272227322742275227622772278227922802281228222832284228522862287228822892290229122922293229422952296229722982299230023012302230323042305230623072308230923102311231223132314231523162317231823192320232123222323232423252326232723282329233023312332233323342335233623372338233923402341234223432344234523462347234823492350235123522353235423552356235723582359236023612362236323642365236623672368236923702371237223732374237523762377237823792380238123822383238423852386238723882389239023912392239323942395239623972398239924002401240224032404240524062407240824092410241124122413241424152416241724182419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443244424452446244724482449245024512452245324542455245624572458245924602461246224632464246524662467246824692470247124722473247424752476247724782479248024812482248324842485248624872488248924902491249224932494249524962497249824992500250125022503250425052506250725082509251025112512251325142515251625172518251925202521252225232524252525262527252825292530253125322533253425352536253725382539254025412542254325442545254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802581258225832584258525862587258825892590259125922593259425952596259725982599260026012602260326042605260626072608260926102611261226132614261526162617261826192620262126222623262426252626262726282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713271427152716271727182719272027212722272327242725272627272728272927302731273227332734273527362737273827392740274127422743274427452746274727482749275027512752275327542755275627572758275927602761276227632764276527662767276827692770277127722773277427752776277727782779278027812782278327842785278627872788278927902791279227932794279527962797279827992800280128022803280428052806280728082809281028112812281328142815281628172818281928202821282228232824282528262827282828292830283128322833283428352836283728382839284028412842284328442845284628472848284928502851285228532854285528562857285828592860286128622863286428652866286728682869287028712872287328742875287628772878287928802881288228832884288528862887288828892890289128922893289428952896289728982899290029012902290329042905290629072908290929102911291229132914291529162917291829192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953295429552956295729582959296029612962296329642965296629672968296929702971297229732974297529762977297829792980298129822983298429852986298729882989299029912992299329942995299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271327232733274327532763277327832793280328132823283328432853286328732883289329032913292329332943295329632973298329933003301330233033304
  1. =============================
  2. DirectX Intermediate Language
  3. =============================
  4. .. contents::
  5. :local:
  6. :depth: 2
  7. Introduction
  8. ============
  9. This document presents the design of the DirectX Intermediate Language (DXIL) for GPU shaders. DXIL is intended to support a direct mapping of the HLSL programming language into Low-Level Virtual Machine Intermediate Representation (LLVM IR), suitable for consumption in GPU drivers. This version of the specification is based on LLVM 3.7 in the use of metadata syntax.
  10. Prior to being converted into the low-level DXIL IR, a higher level IR is generated by codegen which is then transformed into DXIL by the optimizer. This lowers high-level constructs, such as user-defined types, multi-dimensional arrays, matrices, and vectors into simpler abstractions more suitable for fast JIT-ing in the driver compilers. DXIL is derived from LLVM IR.
  11. LLVM is quickly becoming a de facto standard in modern compilation technology. The LLVM framework offers several distinct features, such as a vibrant ecosystem, complete compilation framework, modular design, and reasonable documentation. We can leverage these to achieve two important objectives.
  12. First, unification of shader compilation tool chain. DXIL is a contract between IR producers, such as compilers for HLSL and other domain-specific languages, and IR consumers, such as IHV driver JIT compilers or offline XBOX shader compiler. In addition, the design provides for conversion the current HLSL IL, called DXBC IL in this document, to DXIL.
  13. Second, leveraging the LLVM ecosystem. Microsoft will publicly document DXIL to attract domain language implementers and spur innovation. Using LLVM-based IR offers reduced entry costs for small teams, simply because small teams are likely to use LLVM and Clang as their main compilation framework. We will provide DXIL verifier to check consistency of generated DXIL.
  14. The following diagram shows how some of these components tie together::
  15. HLSL Other shading langs DSL DXBC IL
  16. + + + +
  17. | | | |
  18. v v v v
  19. Clang Clang Other Tools dxbc2dxil
  20. + + + +
  21. | | | |
  22. v v v |
  23. +------+--------------------+---------+ |
  24. | High level IR | |
  25. +-------------------------------------+ |
  26. | |
  27. | |
  28. v |
  29. Optimizer <-----+ Linker |
  30. + ^ + |
  31. | | | |
  32. | | | |
  33. +------------v------+-------------v-----v-------+
  34. | Low level IR (DXIL) |
  35. +------------+----------------------+-----------+
  36. | |
  37. v v
  38. Driver Compiler Verifier
  39. The *dxbc2dxil* element in the diagram is a component that converts existing DXBC shader byte code into DXIL. The *Optimizer* element is a component that consumes the high level IR, verifies it is valid, optimizes it, and produces a valid DXIL form. The *Verifier* element is a public component that verifies and signs DXIL. The *Linker* is a component that combines precompiled DXIL libraries with the entry function to produce a valid shader.
  40. DXIL does not support the following HLSL features that were present in prior implementations.
  41. * Shader models 9 and below. Microsoft may implement 10level9 shader models via DXIL capability tiers.
  42. * Effects.
  43. * HLSL interfaces.
  44. * Shader compression/decompression.
  45. * Partial precision. Half data type should be used instead.
  46. * min10float type. Half data type should be used instead.
  47. * HLSL *uniform* parameter qualifier.
  48. * Current fxc legacy compatibility mode for old shader models (e.g., c-register binding).
  49. * PDB. Debug Information annotations are used instead.
  50. * Compute shader model cs_4_0.
  51. * DXBC label, call, fcall constructs.
  52. The following principles are used to ease reuse with LLVM components and aid extensibility.
  53. * DXIL uses a subset of LLVM IR constructs that makes sense for HLSL.
  54. * No modifications to the core LLVM IR; i.e., no new instructions or fundamental types.
  55. * Additional information is conveyed via metadata, LLVM intrinsics or external functions.
  56. * Name prefixes: 'llvm.dx.', 'llvm.dxil.', 'dx.', and 'dxil.' are reserved.
  57. LLVM IR has three equivalent forms: human-readable, binary (bitcode), and in-memory. DXIL is a binary format and is based on a subset of LLVM IR bitcode format. The document uses only human-readable form to describe DXIL.
  58. Versioning
  59. ==========
  60. There are three versioning mechanisms in DXIL shaders: shader model, DXIL version, and LLVM bitcode version.
  61. At a high-level, the shader model describes the target execution model and environment; DXIL provides a mechanism to express programs (including rules around expressing data types and operations); and LLVM bitcode provides a way to encode a DXIL program.
  62. Shader Model
  63. ------------
  64. The shader model in DXIL is similar to DXBC shader model. A shader model specifies the execution model, the set of capabilities that shader instructions can use and the constraints that a shader program must adhere to.
  65. The shader model is specified as a named metadata in DXIL::
  66. !dx.shaderModel = !{ !0 }
  67. !0 = !{ !"<shadelModelName>", i32 <major>, i32 <minor> }
  68. The following values of <shaderModelName>_<major>_<minor> are supported:
  69. ==================== ===================================== ===========
  70. Target Legacy Models DXIL Models
  71. ==================== ===================================== ===========
  72. Vertex shader (VS) vs_4_0, vs_4_1, vs_5_0, vs_5_1 vs_6_0
  73. Hull shader (HS) hs_5_0, hs_5_1 hs_6_0
  74. Domain shader (DS) ds_5_0, ds_5_1 ds_6_0
  75. Geometry shader (GS) gs_4_0, gs_4_1, gs_5_0, gs_5_1 gs_6_0
  76. Pixel shader (PS) ps_4_0, ps_4_1, ps_5_0, ps_5_1 ps_6_0
  77. Compute shader (CS) cs_5_0 (cs_4_0 is mapped onto cs_5_0) cs_6_0
  78. Shader library no support lib_6_1
  79. Mesh shader (MS) no support ms_6_5
  80. Amplification shader (AS) no support as_6_5
  81. ========================= ===================================== ===========
  82. The DXIL verifier ensures that DXIL conforms to the specified shader model.
  83. For shader models prior to 6.0, only the rules applicable to the DXIL representation are valid. For example, the limits on maximum number of resources is honored, but the limits on registers aren't because DXIL does not have a representation for registers.
  84. DXIL version
  85. ------------
  86. The primary mechanism to evolve HLSL capabilities is through shader models. However, DXIL version is reserved for additional flexibility of future extensions. There are two currently defined versions: 1.0 and 1.1.
  87. DXIL version has major and minor versions that are specified as named metadata::
  88. !dx.version = !{ !0 }
  89. !0 = !{ i32 <major>, i32 <minor> }
  90. DXIL version must be declared exactly once per LLVM module (translation unit) and is valid for the entire module.
  91. DXIL will evolve in a manner that retains backward compatibility.
  92. DXIL 1.1 Changes
  93. ----------------
  94. Main two features that were introduced for DXIL1.1 (Shader Model 6.1) are view instancing and barycentric coordinates. Specifically, there are following changes to the DXIL representation.
  95. * New Intrinsics - AttributeAtVertex_, ViewID
  96. * New System Generated Value - SV_Barycentrics
  97. * New Container Part - ILDN
  98. DXIL 1.2 Changes
  99. ----------------
  100. * RawBufferLoad and RawBufferStore DXIL operations for ByteAddressBuffer and StructuredBuffer
  101. * Denorm mode as a function attribute for float32 "fp32-denorm-mode"=<value>
  102. LLVM Bitcode version
  103. --------------------
  104. The current version of DXIL is based on LLVM bitcode v3.7. This encoding is necessarily implied by something outside the DXIL module.
  105. General Issues
  106. ==============
  107. An important goal is to enable HLSL to be closer to a strict subset of C/C++. This has implications for DXIL design and future hardware feature requests outlined below.
  108. Terminology
  109. -----------
  110. Resource refers to one of the following:
  111. * SRV - shader resource view (read-only)
  112. * UAV - unordered access view (read-write)
  113. * CBV - constant buffer view (read-only)
  114. * Sampler
  115. Intrinsics typically refer to operations missing in the core LLVM IR. DXIL represents HLSL built-in functions (also called intrinsics) not as LLVM intrinsics, but rather as external function calls.
  116. DXIL abstraction level
  117. ----------------------
  118. DXIL has level of abstraction similar to a 'scalarized' DXBC. DXIL is a lower level IR amenable to fast and robust JIT-ing in driver compilers.
  119. In particular, the following passes are performed to lower the HLSL abstractions down to DXIL:
  120. * optimize function parameter copies
  121. * inline functions
  122. * allocate and transform shader signatures
  123. * lower matrices, optimizing intermediate storage
  124. * linearize multi-dimensional arrays and user-defined type accesses
  125. * scalarize vectors
  126. Scalar IR
  127. ---------
  128. DXIL operations work with scalar quantities. Several scalar quantities may be grouped together in a struct to represent several return values, which is used for memory operations, e.g., load/store, sample, etc., that benefit from access coalescing.
  129. Metadata, resource declarations, and debugging info may contain vectors to more closely convey source code shape to tools and debuggers.
  130. Future versions of IR may contain vectors or grouping hints for less-than-32-bit quantities, such as half and i16.
  131. Memory accesses
  132. ---------------
  133. DXIL conceptually aligns with DXBC in how different memory types are accessed. Out-of-bounds behavior and various restrictions are preserved.
  134. Indexable thread-local and groupshared variables are represented as variables and accessed via LLVM C-like pointers.
  135. Swizzled resources, such as textures, have opaque memory layouts from a DXIL point of view. Accesses to these resources are done via intrinsics.
  136. There are two layouts for constant buffer memory: (1) legacy, matching DXBC's layout and (2) linear layout. SM6 DXIL uses intrinsics to read cbuffer for either layout.
  137. Shader signatures require packing and are located in a special type of memory that cannot be viewed as linear. Accesses to signature values are done via special intrinsics in DXIL. If a signature parameter needs to be passed to a function, a copy is created first in threadlocal memory and the copy is passed to the function.
  138. Typed buffers represent memory with in-flight data conversion. Typed buffer load/store/atomics are done via special functions in DXIL with element-granularity indexing.
  139. The following pointer types are supported:
  140. * Non-indexable thread-local variables.
  141. * Indexable thread-local variables (DXBC x-registers).
  142. * Groupshared variables (DXBC g-registers).
  143. * Device memory pointer.
  144. * Constant-buffer-like memory pointer.
  145. The type of DXIL pointer is differentiated by LLVM addrspace construct. The HLSL compiler will make the best effort to infer the exact pointer addrspace such that a driver compiler can issue the most efficient instruction.
  146. A pointer can come into being in a number of ways:
  147. * Global Variables.
  148. * AllocaInst.
  149. * Synthesized as a result of some pointer arithmetic.
  150. DXIL uses 32-bit pointers in its representation.
  151. Out-of-bounds behavior
  152. ----------------------
  153. Indexable thread-local accesses are done via LLVM pointer and have C-like OOB semantics.
  154. Groupshared accesses are done via LLVM pointer too. The origin of a groupshared pointer must be a single TGSM allocation.
  155. If a groupshared pointer uses in-bound GEP instruction, it should not OOB. The behavior for an OOB access for in-bound pointer is undefined.
  156. For groupshared pointer from regular GEP, OOB will has same behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.
  157. Resource accesses keeps the same out-of-bounds behavior as DXBC. Loads return 0 for OOB accesses; OOB stores are silently dropped.
  158. OOB pointer accesses in SM6.0 and later have undefined (C-like) behavior. LLVM memory optimization passes can be used to optimize such accesses. Where out-of-bound behavior is desired, intrinsic functions are used to access memory.
  159. Memory access granularity
  160. -------------------------
  161. Intrinsic and resource accesses may imply a wider access than requested by an instruction. DXIL defines memory accesses for i1, i16, i32, i64, f16, f32, f64 on thread local memory, and i32, f32, f64 for memory I/O (that is, groupshared memory and memory accessed via resources such as CBs, UAVs and SRVs).
  162. Number of virtual values
  163. ------------------------
  164. There is no limit on the number of virtual values in DXIL. The IR is guaranteed to be in an SSA form. For optimized shaders, the optimizer will run -mem2reg LLVM pass as well as perform other memory to register promotions if profitable.
  165. Control-flow restrictions
  166. -------------------------
  167. The DXIL control-flow graph must be reducible, as checked by T1-T2 test. DXIL does not preserve structured control flow of DXBC. Preserving structured control-flow property would impose significant burden on third-party tools optimizing to DXIL via LLVM, reducing appeal of DXIL.
  168. DXIL allows fall-through for switch label blocks. This is a difference from DXBC, in which the fall-through is prohibited.
  169. DXIL will not support the DXBC label and call instructions; LLVM functions can be used instead (see below). The primary uses for these are (1) HLSL interfaces, which are not supported, and (2) outlining of case-bodies in a switch statement annotated with [call], which is not a scenario of interest.
  170. Functions
  171. ---------
  172. Instead of DXBC labels/calls, DXIL supports functions and call instructions. Recursion is not allowed; DXIL validator enforces this.
  173. The functions are regular LLVM functions. Parameters can be passed by-value or by-reference. The functions are to facilitate separate compilation for big, complex shaders. However, driver compilers are free to inline functions as they see fit.
  174. Identifiers
  175. -----------
  176. DXIL identifiers must conform to LLVM IR identifier rules.
  177. Identifier mangling rules are the ones used by Clang 3.7 with the HLSL target.
  178. The following identifier prefixes are reserved:
  179. * dx.*, dxil.*
  180. * llvm.dx.*, llvm.dxil.*
  181. Address Width
  182. -------------
  183. DXIL will use only 32-bit addresses for pointers. Byte offsets are also 32-bit.
  184. Shader restrictions
  185. -------------------
  186. There is no support for the following in DXIL:
  187. * recursion
  188. * exceptions
  189. * indirect function calls and dynamic dispatch
  190. Entry points
  191. ------------
  192. The dx.entryPoints metadata specifies a list of entry point records, one for each entry point. Libraries could specify more than one entry point per module but currently exist outside the DXIL specification; the other shader models must specify exactly one entry point.
  193. For example::
  194. define void @"\01?myfunc1@@YAXXZ"() #0 { ... }
  195. define float @"\01?myfunc2@@YAMXZ"() #0 { ... }
  196. !dx.entryPoints = !{ !1, !2 }
  197. !1 = !{ void ()* @"\01?myfunc1@@YAXXZ", !"myfunc1", !3, null, null }
  198. !2 = !{ float ()* @"\01?myfunc2@@YAMXZ", !"myfunc2", !5, !6, !7 }
  199. Each entry point metadata record specifies:
  200. * reference to the entry point function global symbol
  201. * unmangled name
  202. * list of signatures
  203. * list of resources
  204. * list of tag-value pairs of shader capabilities and other properties
  205. A 'null' value specifies absence of a particular node.
  206. Shader capabilities are properties that are additional to properties dictated by shader model. The list is organized as pairs of i32 tag, followed immediately by the value itself.
  207. Hull shader representation
  208. --------------------------
  209. The hull shader is represented as two functions, related via metadata: (1) control point phase function, which is the entry point of the hull shader, and (2) patch constant phase function.
  210. For example::
  211. !dx.entryPoints = !{ !1 }
  212. !1 = !{ void ()* @"ControlPointFunc", ..., !2 } ; shader entry record
  213. !2 = !{ !"HS", !3 }
  214. !3 = !{ void ()* @"PatchConstFunc", ... } ; additional hull shader state
  215. The patch constant function represents original HLSL computation, and is not separated into fork and join phases, as it is the case in DXBC. The driver compiler may perform such separation if this is profitable for the target GPU.
  216. In DXBC to DXIL conversion, the original patch constant function cannot be recovered during DXBC-to-DXIL conversion. Instead, instructions of each fork and join phases are 'wrapped' by a loop that iterates the corresponding number of phase-instance-count iterations. Thus, fork/join instance ID becomes the loop induction variable. LoadPatchConstant intrinsic (see below) represents load from DXBC vpc register.
  217. The following table summarizes the names of intrinsic functions to load inputs and store outputs of hull and domain shaders. CP stands for Control Point, PC - for Patch Constant.
  218. =================== ==================== ====================== ======================
  219. Operation Control Point (Hull) Patch Constant Domain
  220. =================== ==================== ====================== ======================
  221. Store Input CP
  222. Load Input CP LoadInput LoadInput
  223. Store Output CP StoreOutput
  224. Load Output CP LoadOutputControlPoint LoadInput
  225. Store PC StorePatchConstant
  226. Load PC LoadPatchConstant LoadPatchConstant
  227. Store Output Vertex StoreOutput
  228. =================== ==================== ====================== ======================
  229. LoadPatchConstant function in PC stage is generated only by DXBC-to-DXIL converter, to access DXBC vpc registers. HLSL compiler produces IR that references LLVM IR values directly.
  230. Type System
  231. ===========
  232. Most of LLVM type system constructs are legal in DXIL.
  233. Primitive Types
  234. ---------------
  235. The following types are supported:
  236. * void
  237. * metadata
  238. * i1, i8, i16, i32, i64
  239. * half, float, double
  240. SM6.0 assumes native hardware support for i32 and float types.
  241. i8 is supported only in a few intrinsics to signify masks, enumeration constant values, or in metadata. It's not supported for memory access or computation by the shader.
  242. HLSL min12int, min16int and min16uint data types are mapped to i16.
  243. half and i16 are treated as corresponding DXBC min-presicion types (min16float, min16int/min16uint) in SM6.0.
  244. The HLSL compiler optimizer treats half, i16 and i8 data as data types natively supported by the hardware; i.e., saturation, range clipping, INF/NaN are done according to the IEEE standard. Such semantics allow the optimizer to reuse LLVM optimization passes.
  245. Hardware support for doubles in optional and is guarded by RequiresHardwareDouble CAP bit.
  246. Hardware support for i64 is optional and is guarded by a CAP bit.
  247. Vectors
  248. -------
  249. HLSL vectors are scalarized. They do not participate in computation; however, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection.
  250. Future DXIL may add support for <2 x half> and <2 x i16> vectors or hints for packing related half and i16 quantities.
  251. Matrices
  252. --------
  253. Matrices are lowered to vectors, and are not referenced by instructions. They may be present in declarations to convey original variable layout to tools, debuggers, and reflection.
  254. Arrays
  255. ------
  256. Instructions may reference only 1D arrays of primitive types. However, complex arrays, e.g., multidimensional arrays or user-defined types, may be present to convey original variable layout to tools, debuggers, and reflection.
  257. User-defined types
  258. ------------------
  259. Original HLSL UDTs are lowered and are not referenced by instructions. However, they may be present in declarations to convey original variable layout to tools, debuggers, and reflection. Some resource operations return 'grouping' UDTs that group several return values; such UDTs are immediately 'decomposed' into components that are then consumed by other instructions.
  260. Type conversions
  261. ----------------
  262. Explicit conversions between types are supported via LLVM instructions.
  263. Precise qualifier
  264. -----------------
  265. By default, all floating-point HLSL operations are considered 'fast' or non-precise. HLSL and driver compilers are allowed to refactor such operations. Non-precise LLVM instructions: fadd, fsub, fmul, fdiv, frem, fcmp are marked with 'fast' math flags.
  266. HLSL precise type qualifier requires that all operations contributing to the value be IEEE compliant with respect to optimizations. The /Gis compiler switch implicitly declares all variables and values as precise.
  267. Precise behavior is represented in LLVM instructions: fadd, fsub, fmul, fdiv, frem, fcmp by not having 'fast' math flags set. Each relevant call instruction that contributes to computation of a precise value is annotated with dx.precise metadata that indicates that it is illegal for the driver compiler to perform IEEE-unsafe optimizations.
  268. .. _type-annotations:
  269. Type annotations
  270. ----------------
  271. User-defined types are annotated in DXIL to 'attach' additional properties to structure fields. For example, DXIL may contain type annotations of structures and funcitons for reflection purposes::
  272. namespace MyNameSpace {
  273. struct MyType {
  274. float field1;
  275. int2 field2;
  276. };
  277. }
  278. float main(float col : COLOR) : SV_Target {
  279. .....
  280. }
  281. !dx.typeAnnotations = !{!3, !7}
  282. !3 = !{i32 0, %"struct.MyNameSpace::MyType" undef, !4}
  283. !4 = !{i32 12, !5, !6}
  284. !5 = !{i32 6, !"field1", i32 3, i32 0, i32 7, i32 9}
  285. !6 = !{i32 6, !"field2", i32 3, i32 4, i32 7, i32 4}
  286. !7 = !{i32 1, void (float, float*)* @"main", !8}
  287. !8 = !{!9, !11, !14}
  288. !9 = !{i32 0, !10, !10}
  289. !10 = !{}
  290. !11 = !{i32 0, !12, !13}
  291. !12 = !{i32 4, !"COLOR", i32 7, i32 9}
  292. !13 = !{i32 0}
  293. !14 = !{i32 1, !15, !13}
  294. !15 = !{i32 4, !"SV_Target", i32 7, i32 9}
  295. !16 = !{null, !"lib.no::entry", null, null, null}
  296. The type/field annotation metadata hierarchy recursively mimics LLVM type hierarchy.
  297. dx.typeAnnotations is a metadata of type annotation nodes, where each node represents type annotation of a certain type::
  298. !dx.typeAnnotations = !{!3, !7}
  299. For each **type annotation** node, the first value represents the type of the annotation::
  300. !3 = !{i32 0, %"struct.MyNameSpace::MyType" undef, !4}
  301. !7 = !{i32 1, void (float, float*)* @"main", !8}
  302. === =====================================================================
  303. Idx Type
  304. === =====================================================================
  305. 0 Structure Annotation
  306. 1 Function Annotation
  307. === =====================================================================
  308. The second value represents the name, the third is a corresponding type metadata node.
  309. **Structure Annotation** starts with the size of the structure in bytes, followed by the list of field annotations::
  310. !4 = !{i32 12, !5, !6}
  311. !5 = !{i32 6, !"field1", i32 3, i32 0, i32 7, i32 9}
  312. !6 = !{i32 6, !"field2", i32 3, i32 4, i32 7, i32 4}
  313. **Field Annotation** is a series of pairs with tag number followed by its value. Field Annotation pair is defined as follows
  314. === =====================================================================
  315. Idx Type
  316. === =====================================================================
  317. 0 SNorm
  318. 1 UNorm
  319. 2 Matrix
  320. 3 Buffer Offset
  321. 4 Semantic String
  322. 5 Interpolation Mode
  323. 6 Field Name
  324. 7 Component Type
  325. 8 Precise
  326. === =====================================================================
  327. **Function Annotation** is a series of parameter annotations::
  328. !7 = !{i32 1, void (float, float*)* @"main", !8}
  329. !8 = !{!9, !11, !14}
  330. Each **Parameter Annotation** contains Input/Output type, field annotation, and semantic index::
  331. !9 = !{i32 0, !10, !10}
  332. !10 = !{}
  333. !11 = !{i32 0, !12, !13}
  334. !12 = !{i32 4, !"COLOR", i32 7, i32 9}
  335. !13 = !{i32 0}
  336. !14 = !{i32 1, !15, !13}
  337. !15 = !{i32 4, !"SV_Target", i32 7, i32 9}
  338. Shader Properties and Capabilities
  339. ==================================
  340. Additional shader properties are specified via tag-value pair list, which is the last element in the entry function description record.
  341. Shader Flags
  342. ------------
  343. Shaders have additional flags that covey their capabilities via tag-value pair with tag kDxilShaderFlagsTag (0), followed by an i64 bitmask integer. The bits have the following meaning:
  344. === =====================================================================
  345. Bit Description
  346. === =====================================================================
  347. 0 Disable shader optimizations
  348. 1 Disable math refactoring
  349. 2 Shader uses doubles
  350. 3 Force early depth stencil
  351. 4 Enable raw and structured buffers
  352. 5 Shader uses min-precision, expressed as half and i16
  353. 6 Shader uses double extension intrinsics
  354. 7 Shader uses MSAD
  355. 8 All resources must be bound for the duration of shader execution
  356. 9 Enable view port and RT array index from any stage feeding rasterizer
  357. 10 Shader uses inner coverage
  358. 11 Shader uses stencil
  359. 12 Shader uses intrinsics that access tiled resources
  360. 13 Shader uses relaxed typed UAV load formats
  361. 14 Shader uses Level9 comparison filtering
  362. 15 Shader uses up to 64 UAVs
  363. 16 Shader uses UAVs
  364. 17 Shader uses CS4 raw and structured buffers
  365. 18 Shader uses Rasterizer Ordered Views
  366. 19 Shader uses wave intrinsics
  367. 20 Shader uses int64 instructions
  368. === =====================================================================
  369. Geometry Shader
  370. ---------------
  371. Geometry shader properties are specified via tag-value pair with tag kDxilGSStateTag (1), followed by a list of GS properties. The format of this list is the following.
  372. === ==== ===============================================================
  373. Idx Type Description
  374. === ==== ===============================================================
  375. 0 i32 Input primitive (InputPrimitive enum value).
  376. 1 i32 Max vertex count.
  377. 2 i32 Primitive topology for stream 0 (PrimitiveTopology enum value).
  378. 3 i32 Primitive topology for stream 1 (PrimitiveTopology enum value).
  379. 4 i32 Primitive topology for stream 2 (PrimitiveTopology enum value).
  380. 5 i32 Primitive topology for stream 3 (PrimitiveTopology enum value).
  381. === ==== ===============================================================
  382. Domain Shader
  383. -------------
  384. Domain shader properties are specified via tag-value pair with tag kDxilDSStateTag (2), followed by a list of DS properties. The format of this list is the following.
  385. === ==== ===============================================================
  386. Idx Type Description
  387. === ==== ===============================================================
  388. 0 i32 Tessellator domain (TessellatorDomain enum value).
  389. 1 i32 Input control point count.
  390. === ==== ===============================================================
  391. Hull Shader
  392. -----------
  393. Hull shader properties are specified via tag-value pair with tag kDxilHSStateTag (3), followed by a list of HS properties. The format of this list is the following.
  394. === ======= =====================================================================
  395. Idx Type Description
  396. === ======= =====================================================================
  397. 0 MDValue Patch constant function (global symbol).
  398. 1 i32 Input control point count.
  399. 2 i32 Output control point count.
  400. 3 i32 Tessellator domain (TessellatorDomain enum value).
  401. 4 i32 Tessellator partitioning (TessellatorPartitioning enum value).
  402. 5 i32 Tessellator output primitive (TessellatorOutputPrimitive enum value).
  403. 6 float Max tessellation factor.
  404. === ======= =====================================================================
  405. Compute Shader
  406. --------------
  407. Compute shader has the following tag-value properties.
  408. ===================== ======================== =============================================
  409. Tag Value Description
  410. ===================== ======================== =============================================
  411. kDxilNumThreadsTag(4) MD list: (i32, i32, i32) Number of threads (X,Y,Z) for compute shader.
  412. kDxilWaveSizeTag MD list: (i32) Wave size the shader is compatible with (optional).
  413. ===================== ======================== =============================================
  414. Shader Parameters and Signatures
  415. ================================
  416. This section formalizes how HLSL shader input and output parameters are expressed in DXIL.
  417. HLSL signatures and semantics
  418. -----------------------------
  419. Formal parameters of a shader entry function in HLSL specify how the shader interacts with the graphics pipeline. Input parameters, referred to as an input signature, specify values received by the shader. Output parameters, referred to as an output signature, specify values produced by the shader. The shader compiler maps HLSL input and output signatures into DXIL specifications that conform to hardware constraints outlined in the Direct3D Functional Specification. DXIL specifications are also called signatures.
  420. Signature mapping is a complex process, as there are many constraints. All signature parameters must fit into a finite space of N 4x32-bit registers. For efficiency reasons, parameters are packed together in a way that does not violate specification constraints. The process is called signature packing. Most signatures are tightly packed; however, the VS input signature is not packed, as the values are coming from the Input Assembler (IA) stage rather than the graphics pipeline. Alternately, the PS output signature is allocated to align the SV_Target semantic index with the output register index.
  421. Each HLSL signature parameter is defined via C-like type, interpolation mode, and semantic name and index. The type defines parameter shape, which may be quite complex. Interpolation mode adds to the packing constraints, namely that parameters packed together must have compatible interpolation modes. Semantics are extra names associated with parameters for the following purposes: (1) to specify whether a parameter is as a special System Value (SV) or not, (2) to link parameters to IA or StreamOut API streams, and (3) to aid debugging. Semantic index is used to disambiguate parameters that use the same semantic name, or span multiple rows of the register space.
  422. SV semantics add specific meanings and constraints to associated parameters. A parameter may be supplied by the hardware, and is then known as a System Generated Value (SGV). Alternatively, a parameter may be interpreted by the hardware and is then known as System Interpreted Value (SIV). SGVs and SIVs are pipeline-stage dependent; moreover, some participate in signature packing and some do not. Non-SV semantics always participate in signature packing.
  423. Most System Generated Values (SGV) are loaded using special Dxil intrinsic functions, rather than loading the input from a signature. These usually will not be present in the signature at all. Their presence may be detected by the declaration and use of the special instrinsic function itself. The exceptions to this are notible. In one case they are present and loaded from the signature instead of a special intrinsic because they must be part of the packed signature potentially passed from the prior stage, allowing the prior stage to override these values, such as for SV_PrimitiveID and SV_IsFrontFace that may be written in the the Geometry Shader. In another case, they identify signature elements that still contribute to DXBC signature for informational purposes, but will only use the special intrinsic function to read the value, such as for SV_PrimitiveID for GS input and SampleIndex for PS input.
  424. The classification of behavior for various system values in various signature locations is described in a table organized by SemanticKind and SigPointKind. The SigPointKind is a new classification that uniquely identifies each set of parameters that may be input or output for each entry point. For each combination of SemanticKind and SigPointKind, there is a SemanticInterpretationKind that defines the class of treatment for that location.
  425. Each SigPointKind also has a corresponding element allocation (or packing) behavior called PackingKind. Some SigPointKinds do not result in a signature at all, which corresponds to the packing kind of PackingKind::None.
  426. Signature Points are enumerated as follows in the SigPointKind
  427. .. <py>import hctdb_instrhelp</py>
  428. .. <py::lines('SIGPOINT-RST')>hctdb_instrhelp.get_sigpoint_rst()</py>
  429. .. SIGPOINT-RST:BEGIN
  430. == ======== ======= ============= ============== ================ ============================================================================
  431. ID SigPoint Related ShaderKind PackingKind SignatureKind Description
  432. == ======== ======= ============= ============== ================ ============================================================================
  433. 0 VSIn Invalid Vertex InputAssembler Input Ordinary Vertex Shader input from Input Assembler
  434. 1 VSOut Invalid Vertex Vertex Output Ordinary Vertex Shader output that may feed Rasterizer
  435. 2 PCIn HSCPIn Hull None Invalid Patch Constant function non-patch inputs
  436. 3 HSIn HSCPIn Hull None Invalid Hull Shader function non-patch inputs
  437. 4 HSCPIn Invalid Hull Vertex Input Hull Shader patch inputs - Control Points
  438. 5 HSCPOut Invalid Hull Vertex Output Hull Shader function output - Control Point
  439. 6 PCOut Invalid Hull PatchConstant PatchConstOrPrim Patch Constant function output - Patch Constant data passed to Domain Shader
  440. 7 DSIn Invalid Domain PatchConstant PatchConstOrPrim Domain Shader regular input - Patch Constant data plus system values
  441. 8 DSCPIn Invalid Domain Vertex Input Domain Shader patch input - Control Points
  442. 9 DSOut Invalid Domain Vertex Output Domain Shader output - vertex data that may feed Rasterizer
  443. 10 GSVIn Invalid Geometry Vertex Input Geometry Shader vertex input - qualified with primitive type
  444. 11 GSIn GSVIn Geometry None Invalid Geometry Shader non-vertex inputs (system values)
  445. 12 GSOut Invalid Geometry Vertex Output Geometry Shader output - vertex data that may feed Rasterizer
  446. 13 PSIn Invalid Pixel Vertex Input Pixel Shader input
  447. 14 PSOut Invalid Pixel Target Output Pixel Shader output
  448. 15 CSIn Invalid Compute None Invalid Compute Shader input
  449. 16 MSIn Invalid Mesh None Invalid Mesh Shader input
  450. 17 MSOut Invalid Mesh Vertex Output Mesh Shader vertices output
  451. 18 MSPOut Invalid Mesh Vertex PatchConstOrPrim Mesh Shader primitives output
  452. 19 ASIn Invalid Amplification None Invalid Amplification Shader input
  453. == ======== ======= ============= ============== ================ ============================================================================
  454. .. SIGPOINT-RST:END
  455. Semantic Interpretations are as follows (SemanticInterpretationKind)
  456. .. <py>import hctdb_instrhelp</py>
  457. .. <py::lines('SEMINT-RST')>hctdb_instrhelp.get_sem_interpretation_enum_rst()</py>
  458. .. SEMINT-RST:BEGIN
  459. == ========== =============================================================
  460. ID Name Description
  461. == ========== =============================================================
  462. 0 NA Not Available
  463. 1 SV Normal System Value
  464. 2 SGV System Generated Value (sorted last)
  465. 3 Arb Treated as Arbitrary
  466. 4 NotInSig Not included in signature (intrinsic access)
  467. 5 NotPacked Included in signature, but does not contribute to packing
  468. 6 Target Special handling for SV_Target
  469. 7 TessFactor Special handling for tessellation factors
  470. 8 Shadow Shadow element must be added to a signature for compatibility
  471. 8 ClipCull Special packing rules for SV_ClipDistance or SV_CullDistance
  472. == ========== =============================================================
  473. .. SEMINT-RST:END
  474. Semantic Interpretations for each SemanticKind at each SigPointKind are as follows
  475. .. <py>import hctdb_instrhelp</py>
  476. .. <py::lines('SEMINT-TABLE-RST')>hctdb_instrhelp.get_sem_interpretation_table_rst()</py>
  477. .. SEMINT-TABLE-RST:BEGIN
  478. ====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ======== ======== ======== ========= ========
  479. Semantic VSIn VSOut PCIn HSIn HSCPIn HSCPOut PCOut DSIn DSCPIn DSOut GSVIn GSIn GSOut PSIn PSOut CSIn MSIn MSOut MSPOut ASIn
  480. ====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ======== ======== ======== ========= ========
  481. Arbitrary Arb Arb NA NA Arb Arb Arb Arb Arb Arb Arb NA Arb Arb NA NA NA Arb Arb NA
  482. VertexID SV NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  483. InstanceID SV Arb NA NA Arb Arb NA NA Arb Arb Arb NA Arb Arb NA NA NA NA NA NA
  484. Position Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA NA SV NA NA
  485. RenderTargetArrayIndex Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA NA NA SV NA
  486. ViewPortArrayIndex Arb SV NA NA SV SV Arb Arb SV SV SV NA SV SV NA NA NA NA SV NA
  487. ClipDistance Arb ClipCull NA NA ClipCull ClipCull Arb Arb ClipCull ClipCull ClipCull NA ClipCull ClipCull NA NA NA ClipCull NA NA
  488. CullDistance Arb ClipCull NA NA ClipCull ClipCull Arb Arb ClipCull ClipCull ClipCull NA ClipCull ClipCull NA NA NA ClipCull NA NA
  489. OutputControlPointID NA NA NA NotInSig NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
  490. DomainLocation NA NA NA NA NA NA NA NotInSig NA NA NA NA NA NA NA NA NA NA NA NA
  491. PrimitiveID NA NA NotInSig NotInSig NA NA NA NotInSig NA NA NA Shadow SGV SGV NA NA NA NA SV NA
  492. GSInstanceID NA NA NA NA NA NA NA NA NA NA NA NotInSig NA NA NA NA NA NA NA NA
  493. SampleIndex NA NA NA NA NA NA NA NA NA NA NA NA NA Shadow _41 NA NA NA NA NA NA
  494. IsFrontFace NA NA NA NA NA NA NA NA NA NA NA NA SGV SGV NA NA NA NA NA NA
  495. Coverage NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig _50 NotPacked _41 NA NA NA NA NA
  496. InnerCoverage NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig _50 NA NA NA NA NA NA
  497. Target NA NA NA NA NA NA NA NA NA NA NA NA NA NA Target NA NA NA NA NA
  498. Depth NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked NA NA NA NA NA
  499. DepthLessEqual NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA NA NA NA NA
  500. DepthGreaterEqual NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA NA NA NA NA
  501. StencilRef NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _50 NA NA NA NA NA
  502. DispatchThreadID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig NotInSig NA NA NotInSig
  503. GroupID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig NotInSig NA NA NotInSig
  504. GroupIndex NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig NotInSig NA NA NotInSig
  505. GroupThreadID NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig NotInSig NA NA NotInSig
  506. TessFactor NA NA NA NA NA NA TessFactor TessFactor NA NA NA NA NA NA NA NA NA NA NA NA
  507. InsideTessFactor NA NA NA NA NA NA TessFactor TessFactor NA NA NA NA NA NA NA NA NA NA NA NA
  508. ViewID NotInSig _61 NA NotInSig _61 NotInSig _61 NA NA NA NotInSig _61 NA NA NA NotInSig _61 NA NotInSig _61 NA NA NotInSig NA NA NA
  509. Barycentrics NA NA NA NA NA NA NA NA NA NA NA NA NA NotPacked _61 NA NA NA NA NA NA
  510. ShadingRate NA SV _64 NA NA SV _64 SV _64 NA NA SV _64 SV _64 SV _64 NA SV _64 SV _64 NA NA NA NA SV NA
  511. CullPrimitive NA NA NA NA NA NA NA NA NA NA NA NA NA NotInSig NA NA NA NA NotPacked NA
  512. ====================== ============ ======== ============ ============ ======== ======== ========== ============ ======== ======== ======== ============ ======== ============= ============= ======== ======== ======== ========= ========
  513. .. SEMINT-TABLE-RST:END
  514. Below is a vertex shader example that is used for illustration throughout this section::
  515. struct Foo {
  516. float a;
  517. float b[2];
  518. };
  519. struct VSIn {
  520. uint vid : SV_VertexID;
  521. float3 pos : Position;
  522. Foo foo[3] : SemIn1;
  523. float f : SemIn10;
  524. };
  525. struct VSOut
  526. {
  527. float f : SemOut1;
  528. Foo foo[3] : SemOut2;
  529. float4 pos : SV_Position;
  530. };
  531. void main(in VSIn In, // input signature
  532. out VSOut Out) // output signature
  533. {
  534. ...
  535. }
  536. Signature packing must be efficient. It should use as few registers as possible, and the packing algorithm should run in reasonable time. The complication is that the problem is NP complete, and the algorithm needs to resort to using a heuristic.
  537. While the details of the packing algorithm are not important at the moment, it is important to outline some concepts related to how a packed signature is represented in DXIL. Packing is further complicated by the complexity of parameter shapes induced by the C/C++ type system. In the example above, fields of Out.foo array field are actually arrays themselves, strided in memory. Allocating such strided shapes efficiently is hard. To simplify packing, the first step is to break user-defined (struct) parameters into constituent components and to make strided arrays contiguous. This preparation step enables the algorithm to operate on dense rectangular shapes, which we call signature elements. The output signature in the example above has the following elements: float Out_f, float Out_foo_a[3], float Out_foo_b[2][3], and float4 pos. Each element is characterized by the number of rows and columns. These are 1x1, 3x1, 6x1, and 1x4, respectively. The packing algorithm reduces to fitting these elements into Nx4 register space, satisfying all packing-compatibility constraints.
  538. Signature element record
  539. ------------------------
  540. Each signature element is represented in DXIL as a metadata record.
  541. For above example output signature, the element records are as follows::
  542. ; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  543. !20 = !{i32 6, !"SemOut", i8 0, i8 0, !40, i8 2, i32 1, i8 1, i32 1, i8 2, null}
  544. !21 = !{i32 7, !"SemOut", i8 0, i8 0, !41, i8 2, i32 3, i8 1, i32 1, i8 1, null}
  545. !22 = !{i32 8, !"SemOut", i8 0, i8 0, !42, i8 2, i32 6, i8 1, i32 1, i8 0, null}
  546. !23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43, i8 2, i32 1, i8 4, i32 0, i8 0, null}
  547. A record contains the following fields.
  548. === =============== ===============================================================================
  549. Idx Type Description
  550. === =============== ===============================================================================
  551. 0 i32 Unique signature element record ID, used to identify the element in operations.
  552. 1 String metadata Semantic name.
  553. 2 i8 ComponentType (enum value).
  554. 3 i8 SemanticKind (enum value).
  555. 4 Metadata Metadata list that enumerates all semantic indexes of the flattened parameter.
  556. 5 i8 InterpolationMode (enum value).
  557. 6 i32 Number of element rows.
  558. 7 i8 Number of element columns.
  559. 8 i32 Starting row of element packing location.
  560. 9 i8 Starting column of element packing location.
  561. 10 Metadata Metadata list of additional tag-value pairs; can be 'null' or empty.
  562. === =============== ===============================================================================
  563. Semantic name system values always start with 'S', 'V', '_' , and it is illegal to start a user semantic with this prefix. Non-SVs can be ignored by drivers. Debug layers may use these to help validate signature compatibility between stages.
  564. The last metadata list is used to specify additional properties and future extensions.
  565. Signature record metadata
  566. -------------------------
  567. A shader typically has two signatures: input and output, while domain shader has an additional patch constant signature. The signatures are composed of signature element records and are attached to the shader entry metadata. The examples below clarify metadata details.
  568. Vertex shader HLSL
  569. ~~~~~~~~~~~~~~~~~~
  570. Here is the HLSL of the above vertex shader. The semantic index assignment is explained in section below::
  571. struct Foo
  572. {
  573. float a;
  574. float b[2];
  575. };
  576. struct VSIn
  577. {
  578. uint vid : SV_VertexID;
  579. float3 pos : Position;
  580. Foo foo[3] : SemIn1;
  581. // semantic index assignment:
  582. // foo[0].a : SemIn1
  583. // foo[0].b[0] : SemIn2
  584. // foo[0].b[1] : SemIn3
  585. // foo[1].a : SemIn4
  586. // foo[1].b[0] : SemIn5
  587. // foo[1].b[1] : SemIn6
  588. // foo[2].a : SemIn7
  589. // foo[2].b[0] : SemIn8
  590. // foo[2].b[1] : SemIn9
  591. float f : SemIn10;
  592. };
  593. struct VSOut
  594. {
  595. float f : SemOut1;
  596. Foo foo[3] : SemOut2;
  597. // semantic index assignment:
  598. // foo[0].a : SemOut2
  599. // foo[0].b[0] : SemOut3
  600. // foo[0].b[1] : SemOut4
  601. // foo[1].a : SemOut5
  602. // foo[1].b[0] : SemOut6
  603. // foo[1].b[1] : SemOut7
  604. // foo[2].a : SemOut8
  605. // foo[2].b[0] : SemOut9
  606. // foo[2].b[1] : SemOut10
  607. float4 pos : SV_Position;
  608. };
  609. void main(in VSIn In, // input signature
  610. out VSOut Out) // output signature
  611. {
  612. ...
  613. }
  614. The input signature is packed to be compatible with the IA stage. A packing algorithm must assign the following starting positions to the input signature elements:
  615. =================== ==== ======= ========= ===========
  616. Input element Rows Columns Start row Start column
  617. =================== ==== ======= ========= ===========
  618. uint VSIn.vid 1 1 0 0
  619. float3 VSIn.pos 1 3 1 0
  620. float VSIn.foo.a[3] 3 1 2 0
  621. float VSIn.foo.b[6] 6 1 5 0
  622. float VSIn.f 1 1 11 0
  623. =================== ==== ======= ========= ===========
  624. A reasonable packing algorithm would assign the following starting positions to the output signature elements:
  625. ==================== ==== ======= ========= ===========
  626. Input element Rows Columns Start row Start column
  627. ==================== ==== ======= ========= ===========
  628. uint VSOut.f 1 1 1 2
  629. float VSOut.foo.a[3] 3 1 1 1
  630. float VSOut.foo.b[6] 6 1 1 0
  631. float VSOut.pos 1 4 0 0
  632. ==================== ==== ======= ========= ===========
  633. Semantic index assignment
  634. ~~~~~~~~~~~~~~~~~~~~~~~~~
  635. Semantic index assignment in DXIL is exactly the same as for DXBC. Semantic index assignment, abbreviated s.idx above, is a consecutive enumeration of all fields under the same semantic name as if the signature were packed for the IA stage. That is, given a complex signature element, e.g., VSOut's foo[3] with semantic name SemOut and starting index 2, the element is flattened into individual fields: foo[0].a, foo[0].b[0], ..., foo[2].b[1], and the fields receive consecutive semantic indexes 2, 3, ..., 10, respectively. Semantic-index pairs are used to set up the IA stage and to capture values of individual signature registers via the StreamOut API.
  636. DXIL for VS signatures
  637. ~~~~~~~~~~~~~~~~~~~~~~
  638. The corresponding DXIL metadata is presented below::
  639. !dx.entryPoints = !{ !1 }
  640. !1 = !{ void @main(), !"main", !2, null, null }
  641. ; Signatures: In, Out, Patch Constant (optional)
  642. !2 = !{ !3, !4, null }
  643. ; Input signature (packed accordiong to IA rules)
  644. !3 = !{ !10, !11, !12, !13, !14 }
  645. ; element idx, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  646. !10 = !{i32 1, !"SV_VertexID", i8 0, i8 1, !30, i32 0, i32 1, i8 1, i32 0, i8 0, null}
  647. !11 = !{i32 2, !"Position", i8 0, i8 0, !30, i32 0, i32 1, i8 3, i32 1, i8 0, null}
  648. !12 = !{i32 3, !"SemIn", i8 0, i8 0, !32, i32 0, i32 3, i8 1, i32 2, i8 0, null}
  649. !13 = !{i32 4, !"SemIn", i8 0, i8 0, !33, i32 0, i32 6, i8 1, i32 5, i8 0, null}
  650. !14 = !{i32 5, !"SemIn", i8 0, i8 0, !34, i32 0, i32 1, i8 1, i32 11, i8 0, null}
  651. ; semantic index assignment:
  652. !30 = !{ i32 0 }
  653. !32 = !{ i32 1, i32 4, i32 7 }
  654. !33 = !{ i32 2, i32 3, i32 5, i32 6, i32 8, i32 9 }
  655. !34 = !{ i32 10 }
  656. ; Output signature (tightly packed according to pipeline stage packing rules)
  657. !4 = !{ !20, !21, !22, !23 }
  658. ; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  659. !20 = !{i32 6, !"SemOut", i8 0, i8 0, !40, i32 2, i32 1, i8 1, i32 1, i8 2, null}
  660. !21 = !{i32 7, !"SemOut", i8 0, i8 0, !41, i32 2, i32 3, i8 1, i32 1, i8 1, null}
  661. !22 = !{i32 8, !"SemOut", i8 0, i8 0, !42, i32 2, i32 6, i8 1, i32 1, i8 0, null}
  662. !23 = !{i32 9, !"SV_Position", i8 0, i8 3, !43, i32 2, i32 1, i8 4, i32 0, i8 0, null}
  663. ; semantic index assignment:
  664. !40 = !{ i32 1 }
  665. !41 = !{ i32 2, i32 5, i32 8 }
  666. !42 = !{ i32 3, i32 4, i32 6, i32 7, i32 9, i32 10 }
  667. !43 = !{ i32 0 }
  668. Hull shader example
  669. ~~~~~~~~~~~~~~~~~~~
  670. A hull shader (HS) is defined by two entry point functions: control point (CP) function to compute control points, and patch constant (PC) function to compute patch constant data, including the tessellation factors. The inputs to both functions are the input control points for an entire patch, and therefore each element may be indexed by row and, in addition, is indexed by vertex.
  671. Here is an HS example entry point metadata and signature list::
  672. ; !105 is extended parameter list containing reference to HS State:
  673. !101 = !{ void @HSMain(), !"HSMain", !102, null, !105 }
  674. ; Signatures: In, Out, Patch Constant
  675. !102 = !{ !103, !104, !204 }
  676. The entry point record specifies: (1) CP function HSMain as the main symbol, and (2) PC function via optional metadata node !105.
  677. CP-input signature describing one input control point::
  678. !103 = !{ !110, !111 }
  679. ; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  680. !110= !{i32 1, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0, i8 0, null}
  681. !111= !{i32 2, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1, i8 0, null}
  682. ; semantic indexing for flattened elements:
  683. !130 = !{ i32 0 }
  684. !131 = !{ i32 0, i32 1, i32 2, i32 3 }
  685. Note that SV_OutputControlPointID and SV_PrimitiveID input elements are SGVs loaded through special Dxil intrinsics, and are not present in the signature at all. These have a semantic interpretation of SemanticInterpretationKind::NotInSig.
  686. CP-output signature describing one output control point::
  687. !104 = !{ !120, !121 }
  688. ; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  689. !120= !{i32 3, !"SV_Position", i8 0, i8 3, !130, i32 0, i32 1, i8 4, i32 0, i8 0, null}
  690. !121= !{i32 4, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 1, i8 0, null}
  691. Hull shaders require an extended parameter that defines extra state::
  692. ; extended parameter HS State
  693. !105 = !{ i32 3, !201 }
  694. ; HS State record defines patch constant function and other properties
  695. ; Patch Constant Function, in CP count, out CP count, tess domain, tess part, out prim, max tess factor
  696. !201 = !{ void @PCMain(), 4, 4, 3, 1, 3, 16.0 }
  697. PC-output signature::
  698. !204 = !{ !220, !221, !222 }
  699. ; element ID, semantic name, etype, sv, s.idx, interp, rows, cols, start row, col, ext. list
  700. !220= !{i32 3, !"SV_TessFactor", i8 0, i8 25, !130, i32 0, i32 4, i8 1, i32 0, i8 3, null}
  701. !221= !{i32 4, !"SV_InsideTessFactor", i8 0, i8 26, !231, i32 0, i32 2, i8 1, i32 4, i8 3, null}
  702. !222= !{i32 5, !"array", i8 0, i8 0, !131, i32 0, i32 4, i8 3, i32 0, i8 0, null}
  703. ; semantic indexing for flattened elements:
  704. !231 = !{ i32 0, i32 1 }
  705. Accessing signature value in operations
  706. ---------------------------------------
  707. There are no function parameters or variables that correspond to signature elements. Instead loadInput and storeOutput functions are used to access signature element values in operations. The accesses are scalar.
  708. These are the operation signatures::
  709. ; overloads: SM5.1: f16|f32|i16|i32, SM6.0: f16|f32|f64|i8|i16|i32|i64
  710. declare float @dx.op.loadInput.f32(
  711. i32, ; opcode
  712. i32, ; input ID
  713. i32, ; row (relative to start row of input ID)
  714. i8, ; column (relative to start column of input ID), constant in [0,3]
  715. i32) ; vertex index
  716. ; overloads: SM5.1: f16|f32|i16|i32, SM6.0: f16|f32|f64|i8|i16|i32|i64
  717. declare void @dx.op.storeOutput.f32(
  718. i32, ; opcode
  719. i32, ; output ID
  720. i32, ; row (relative to start row of output ID)
  721. i8, ; column (relative to start column of output ID), constant in [0,3]
  722. float) ; value to store
  723. LoadInput/storeOutput takes input/output element ID, which is the unique ID of a signature element metadata record. The row parameter is the array element row index from the start of the element; the register index is obtained by adding the start row of the element and the row parameter value. Similarly, the column parameter is relative column index; the packed register component is obtained by adding the start component of the element (packed col) and the column value. Several overloads exist to access elements of different primitive types. LoadInput takes an additional vertex index parameter that represents vertex index for DS CP-inputs and GS inputs; vertex index must be undef in other cases.
  724. Signature packing
  725. -----------------
  726. Signature elements must be packed into a space of N 4-32-bit registers according to runtime constraints. DXIL contains packed signatures. The packing algorithm is more aggressive than that for DX11. However, DXIL packing is only a suggestion to the driver implementation. Driver compilers can rearrange signature elements as they see fit, while preserving compatibility of connected pipeline stages. DXIL is designed in such a way that it is easy to 'relocate' signature elements - loadInput/storeOutput row and column indices do not need to change since they are relative to the start row/column for each element.
  727. Signature packing types
  728. ~~~~~~~~~~~~~~~~~~~~~~~
  729. Two pipeline stages can connect in four different ways, resulting in four packing types.
  730. 1. Input Assembly: VS input only
  731. * Elements all map to unique registers, they may not be packed together.
  732. * Interpolation mode is not used.
  733. 2. Connects to Rasterizer: VS output, HS CP-input/output and PC-input, DS CP-input/output, GS input/output, PS input
  734. * Elements can be packed according to constraints.
  735. * Interpolation mode is used and must be consistent between connecting signatures.
  736. * While HS CP-output and DS CP-input signatures do not go through the rasterizer, they are still treated as such. The reason is the pass-through HS case, in which HS CP-input and HS CP-output must have identical packing for efficiency.
  737. 3. Patch Constant: HS PC-output, DS PC-input
  738. * SV_TessFactor and SV_InsideTessFactor are the only SVs relevant here, and this is the only location where they are legal. These have special packing considerations.
  739. * Interpolation mode is not used.
  740. 4. Pixel Shader Output: PS output only
  741. * Only SV_Target maps to output register space.
  742. * No packing is performed, semantic index corresponds to render target index.
  743. Packing constraints
  744. ~~~~~~~~~~~~~~~~~~~
  745. The packing algorithm is stricter and more aggressive in DXIL than in DXBC, although still compatible. In particular, array signature elements are not broken up into scalars, even if each array access can be disambiguated to a literal index. DXIL and DXBC signature packing are not identical, so linking them together into a single pipeline is not supported across compiler generations.
  746. The row dimension of a signature element represents an index range. If constraints permit, two adjacent or overlapping index ranges are coalesced into a single index range.
  747. Packing constraints are as follows:
  748. 1. A register must have only one interpolation mode for all 4 components.
  749. 2. Register components containing SVs must be to the right of components containing non-SVs.
  750. 3. SV_ClipDistance and SV_CullDistance have additional constraints:
  751. a. May be packed together
  752. b. Must occupy a maximum of 2 registers (8-components)
  753. c. SV_ClipDistance must have linear interpolation mode
  754. 4. Registers containing SVs may not be within an index range, with the exception of Tessellation Factors (TessFactors).
  755. 5. If an index range R1 overlaps with a TessFactor index range R2, R1 must be contained within R2. As a consequence, outside and inside TessFactors occupy disjoint index ranges when packed.
  756. 6. Non-TessFactor index ranges are combined into a larger range, if they overlap.
  757. 7. SGVs must be packed after all non-SGVs have been packed. If there are several SGVs, they are packed in the order of HLSL declaration.
  758. Packing for SGVs
  759. ~~~~~~~~~~~~~~~~
  760. Non-SGV portions of two connecting signatures must match; however, SGV portions don't have to. An example would be a PS declaring SV_PrimitiveID as an input. If VS connects to PS, PS's SV_PrimitiveID value is synthesized by hardware; moreover, it is illegal to output SV_PrimitiveID from a VS. If GS connects PS, GS may declare SV_PrimitiveID as its output.
  761. Unfortunately, SGV specification creates a complication for separate compilation of connecting shaders. For example, GS outputs SV_PrimitiveID, and PS inputs SV_IsFrontFace and SV_PrimitiveID in this order. The positions of SV_PrimitiveID are incompatible in GS and PS signatures. Not much can be done about this ambiguity in SM5.0 and earlier; the programmers will have to rely on SDKLayers to catch potential mismatch.
  762. SM5.1 and later shaders work on D3D12+ runtime that uses PSO objects to describe pipeline state. Therefore, a driver compiler has access to both connecting shaders during compilation, even though the HLSL compiler does not. The driver compiler can resolve SGV ambiguity in signatures easily. For SM5.1 and later, the HLSL compiler will ensure that declared SGVs fit into packed signature; however, it will set SGV's start row-column location to (-1, 0) such that the driver compiler must resolve SGV placement during PSO compilation.
  763. Shader Resources
  764. ================
  765. All global resources referenced by entry points of an LLVM module are described via named metadata dx.resources, which consists of four metadata lists of resource records::
  766. !dx.resources = !{ !1, !2, !3, !4 }
  767. Resource lists are as follows.
  768. === ======== ==============================
  769. Idx Type Description
  770. === ======== ==============================
  771. 0 Metadata SRVs - shader resource views.
  772. 1 Metadata UAVs - unordered access views.
  773. 2 Metadata CBVs - constant buffer views.
  774. 3 Metadata Samplers.
  775. === ======== ==============================
  776. Metadata resource records
  777. -------------------------
  778. Each resource list contains resource records. Each resource record contains fields that are common for each resource type, followed by fields specific to each resource type, followed by a metadata list of tag/value pairs, which can be used to specify additional properties or future extensions and may be null or empty.
  779. Common fields:
  780. === =============== ==========================================================================================
  781. Idx Type Description
  782. === =============== ==========================================================================================
  783. 0 i32 Unique resource record ID, used to identify the resource record in createHandle operation.
  784. 1 Pointer Pointer to a global constant symbol with the original shape of resource and element type.
  785. 2 Metadata string Name of resource variable.
  786. 3 i32 Bind space ID of the root signature range that corresponds to this resource.
  787. 4 i32 Bind lower bound of the root signature range that corresponds to this resource.
  788. 5 i32 Range size of the root signature range that corresponds to this resource.
  789. === =============== ==========================================================================================
  790. When the shader has reflection information, the name is the original, unmangled HLSL name. If reflection is stripped, the name is empty string.
  791. SRV-specific fields:
  792. === =============== ==========================================================================================
  793. Idx Type Description
  794. === =============== ==========================================================================================
  795. 6 i32 SRV resource shape (enum value).
  796. 7 i32 SRV sample count.
  797. 8 Metadata Metadata list of additional tag-value pairs.
  798. === =============== ==========================================================================================
  799. SRV-specific tag/value pairs:
  800. === === ==== =================================================== ============================================
  801. Idx Tag Type Resource Type Description
  802. === === ==== =================================================== ============================================
  803. 0 0 i32 Any resource, except RawBuffer and StructuredBuffer Element type.
  804. 1 1 i32 StructuredBuffer Element stride or StructureBuffer, in bytes.
  805. === === ==== =================================================== ============================================
  806. The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).
  807. UAV-specific fields:
  808. === =============== ==========================================================================================
  809. Idx Type Description
  810. === =============== ==========================================================================================
  811. 6 i32 UAV resource shape (enum value).
  812. 7 i1 1 - globally-coherent UAV; 0 - otherwise.
  813. 8 i1 1 - UAV has counter; 0 - otherwise.
  814. 9 i1 1 - UAV is ROV (rasterizer ordered view); 0 - otherwise.
  815. 10 Metadata Metadata list of additional tag-value pairs.
  816. === =============== ==========================================================================================
  817. UAV-specific tag/value pairs:
  818. === === ==== ====================================================== ============================================
  819. Idx Tag Type Resource Type Description
  820. === === ==== ====================================================== ============================================
  821. 0 0 i32 RW resource, except RWRawBuffer and RWStructuredBuffer Element type.
  822. 1 1 i32 RWStructuredBuffer Element stride or StructureBuffer, in bytes.
  823. === === ==== ====================================================== ============================================
  824. The symbol names for the are kDxilTypedBufferElementTypeTag (0) and kDxilStructuredBufferElementStrideTag (1).
  825. CBV-specific fields:
  826. === =============== ==========================================================================================
  827. Idx Type Description
  828. === =============== ==========================================================================================
  829. 6 i32 Constant buffer size in bytes.
  830. 7 Metadata Metadata list of additional tag-value pairs.
  831. === =============== ==========================================================================================
  832. Sampler-specific fields:
  833. === =============== ==========================================================================================
  834. Idx Type Description
  835. === =============== ==========================================================================================
  836. 6 i32 Sampler type (enum value).
  837. 7 Metadata Metadata list of additional tag-value pairs.
  838. === =============== ==========================================================================================
  839. The following example demonstrates SRV metadata::
  840. ; Original HLSL
  841. ; Texture2D<float4> MyTexture2D : register(t0, space0);
  842. ; StructuredBuffer<NS1::MyType1> MyBuffer[2][3] : register(t1, space0);
  843. !1 = !{ !2, !3 }
  844. ; Scalar resource: Texture2D<float4> MyTexture2D.
  845. %dx.types.ResElem.v4f32 = type { <4 x float> }
  846. @MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
  847. !2 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
  848. i32 0, i32 0, i32 1, i32 2, i32 0, null }
  849. ; Array resource: StructuredBuffer<MyType1> MyBuffer[2][3].
  850. %struct.NS1.MyType1 = type { float, <2 x i32> }
  851. %dx.types.ResElem.NS1.MyType1 = type { %struct.NS1.MyType1 }
  852. @MyBuffer = external addrspace(1) constant [2x [3 x %dx.types.ResElem.NS1.MyType1]], align 16
  853. !3 = !{ i32 1, [2 x [3 x %dx.types.ResElem.NS1.MyType1]] addrspace(1)* @MyBuffer, !"MyBuffer",
  854. i32 0, i32 1, i32 6, i32 11, i32 0, null }
  855. The type name of the variable is constructed by appending the element name (primitive, vector or UDT name) to dx.types.ResElem prefix. The type configuration of the resource range variable conveys (1) resource range shape and (2) resource element type.
  856. Reflection information
  857. ----------------------
  858. Resource reflection data is conveyed via the resource's metadata record and global, external variable. The metadata record contains the original HLSL name, root signature range information, and the reference to the global resource variable declaration. The resource variable declaration conveys resource range shape, resource type and resource element type.
  859. The following disassembly provides an example::
  860. ; Scalar resource: Texture2D<float4> MyTexture2D.
  861. %dx.types.ResElem.v4f32 = type { <4 x float> }
  862. @MyTexture2D = external addrspace(1) constant %dx.types.ResElem.v4f32, align 16
  863. !0 = !{ i32 0, %dx.types.ResElem.v4f32 addrspace(1)* @MyTexture2D, !"MyTexture2D",
  864. i32 0, i32 3, i32 1, i32 2, i32 0, null }
  865. ; struct MyType2 { float4 field1; int2 field2; };
  866. ; Constant buffer: ConstantBuffer<MyType2> MyCBuffer1[][3] : register(b5, space7)
  867. %struct.MyType2 = type { <4 x float>, <2 x i32> }
  868. ; Type reflection information (optional)
  869. !struct.MyType2 = !{ !1, !2 }
  870. !1 = !{ !"field1", null }
  871. !2 = !{ !"field2", null }
  872. %dx.types.ResElem.MyType1 = type { %struct.MyType2 }
  873. @MyCBuffer1 = external addrspace(1) constant [0 x [3 x %dx.types.ResElem.MyType2]], align 16
  874. !3 = !{ i32 0, [0 x [3 x %dx.types.ResElem.MyType1]] addrspace(1)* @MyCBuffer1, !"MyCBuffer1",
  875. i32 7, i32 5, i32 -1, null }
  876. The reflection information can be removed from DXIL by obfuscating the resource HLSL name and resource variable name as well as removing reflection type annotations, if any.
  877. Structure of resource operation
  878. -------------------------------
  879. Operations involving shader resources and samplers are expressed via external function calls.
  880. Below is an example for the sample method::
  881. %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
  882. declare %dx.types.ResRet.f32 @dx.op.sample.f32(
  883. i32, ; opcode
  884. %dx.types.ResHandle, ; texture handle
  885. %dx.types.SamplerHandle, ; sampler handle
  886. float, ; coordinate c0
  887. float, ; coordinate c1
  888. float, ; coordinate c2
  889. float, ; coordinate c3
  890. i32, ; offset o0
  891. i32, ; offset o1
  892. i32, ; offset o2
  893. float) ; clamp
  894. The method always returns five scalar values that are aggregated in dx.types.ResRet.f32 type and extracted into scalars via LLVM's extractelement right after the call. The first four elements are sample values and the last field is the status of operation for tiled resources. Some return values may be unused, which is easily determined from the SSA form. The driver compiler is free to specialize the sample instruction to the most efficient form depending on which return values are used in computation.
  895. If applicable, each intrinsic is overloaded on return type, e.g.::
  896. %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
  897. %dx.types.ResRet.f16 = type { half, half, half, half, i32 }
  898. declare %dx.types.ResRet.f32 @dx.op.sample.f32(...)
  899. declare %dx.types.ResRet.f16 @dx.op.sample.f16(...)
  900. Wherever applicable, the return type indicates the "precision" at which the operation is executed. For example, sample intrinsic that returns half data is allowed to be executed at half precision, assuming hardware supports this; however, if the return type is float, the sample operation must be executed in float precision. If lower-precision is not supported by hardware, it is allowed to execute a higher-precision variant of the operation.
  901. The opcode parameter uniquely identifies the sample operation. More details can be found in the Instructions section. The value of opcode is the same for all overloads of an operation.
  902. Some resource operations are "polymorphic" with respect to resource types, e.g., dx.op.sample.f32 operates on several resource types: Texture1D[Array], Texture2D[Array], Texture3D, TextureCUBE[Array].
  903. Each resource/sampler is represented by a pair of i32 values. The first value is a unique (virtual) resource range ID, which corresponds to HLSL declaration of a resource/sampler. Range ID must be a constant for SM5.1 and below. The second integer is a 0-based index within the range. The index must be constant for SM5.0 and below.
  904. Both indices can be dynamic for SM6 and later to provide flexibility in usage of resources/samplers in control flow, e.g.::
  905. Texture2D<float4> a[8], b[8];
  906. ...
  907. Texture2D<float4> c;
  908. if(cond) // arbitrary expression
  909. c = a[idx1];
  910. else
  911. c = b[idx2];
  912. ... = c.Sample(...);
  913. Resources/samplers used in such a way must reside in descriptor tables (cannot be root descriptors); this will be validated during shader and root signature setup.
  914. The DXIL verifier will ensure that all leaf-ranges (a and b above) of such a resource/sampler live-range have the same resource/sampler type and element type. If applicable, this constraint may be relaxed in the future. In particular, it is logical from HLSL programmer point of view to issue loads on compatible resource types, e.g., Texture2D, RWTexture2D, ROVTexture2D::
  915. Texture2D<float4> a[8];
  916. RWTexture2D<float4> b[6];
  917. ...
  918. Texture2D<float4> c;
  919. if(cond) // arbitrary expression
  920. c = a[idx1];
  921. else
  922. c = b[idx2];
  923. ... = c.Load(...);
  924. LLVM's undef value is used for unused input parameters. For example, coordinates c2 and c3 in an dx.op.sample.f32 call for Texture2D are undef, as only two coordinates c0 and c1 are required.
  925. If the clamp parameter is unused, its default value is 0.0f.
  926. Resource operations are not overloaded on input parameter types. For example, dx.op.sample.f32 operation does not have an overload where coordinates have half, rather than float, data type. Instead, the precision of input arguments can be inferred from the IR via a straightforward lookup along an SSA edge, e.g.::
  927. %c0 = fpext half %0 to float
  928. %res = call %dx.types.ResRet.f32 @dx.op.sample.f32(..., %c0, ...)
  929. SSA form makes it easy to infer that value %0 of type half got promoted to float. The driver compiler can tailor the instruction to the most efficient form for the target hardware.
  930. Resource operations
  931. -------------------
  932. The section lists resource access operations. The specification is given for float return type, if applicable. The list of all overloads can be found in the appendix on intrinsic operations.
  933. Some general rules to interpret resource operations:
  934. * The number of active (meaningful) return components is determined by resource element type. Other return values must be unused; validator ensures this.
  935. * GPU instruction needs status only if the status return value is used in the program, which is determined through SSA.
  936. * Overload suffixes are specified for each resource operation.
  937. * Type of resource determines which inputs must be defined. Unused inputs are passed typed LLVM 'undef' values. This is checked by the DXIL validator.
  938. * Offset input parameters are i8 constants in [-8,+7] range; default offset is 0.
  939. Resource operation return types
  940. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  941. Many resource operations return several scalar values as well as status for tiled resource access. The return values are grouped into a helper structure type, as this is LLVM's way to return several values from the operation. After an operation, helper types are immediately decomposed into scalars, which are used in further computation.
  942. The defined helper types are listed below::
  943. %dx.types.ResRet.i8 = type { i8, i8, i8, i8, i32 }
  944. %dx.types.ResRet.i16 = type { i16, i16, i16, i16, i32 }
  945. %dx.types.ResRet.i32 = type { i32, i32, i32, i32, i32 }
  946. %dx.types.ResRet.i64 = type { i64, i64, i64, i64, i32 }
  947. %dx.types.ResRet.f16 = type { half, half, half, half, i32 }
  948. %dx.types.ResRet.f32 = type { float, float, float, float, i32 }
  949. %dx.types.ResRet.f64 = type { double, double, double, double, i32 }
  950. %dx.types.Dimensions = type { i32, i32, i32, i32 }
  951. %dx.types.SamplePos = type { float, float }
  952. Resource handles
  953. ~~~~~~~~~~~~~~~~
  954. Resources are identified via handles passed to resource operations. Handles are represented via opaque type::
  955. %dx.types.Handle = type { i8 * }
  956. The handles are created out of resource range ID and index into the range::
  957. declare %dx.types.Handle @dx.op.createHandle(
  958. i32, ; opcode
  959. i8, ; resource class: SRV=0, UAV=1, CBV=2, Sampler=3
  960. i32, ; resource range ID (constant)
  961. i32, ; index into the range
  962. i1) ; non-uniform resource index: false or true
  963. Resource class is a constant that indicates which metadata list (SRV, UAV, CBV, Sampler) to use for property queries.
  964. Resource range ID is an i32 constant, which is the position of the metadata record in the corresponding metadata list. Range IDs start with 0 and are contiguous within each list.
  965. Index is an i32 value that may be a constant or a value computed by the shader.
  966. CBufferLoadLegacy
  967. ~~~~~~~~~~~~~~~~~
  968. The following signature shows the operation syntax::
  969. ; overloads: SM5.1: f32|i32|f64, future SM: possibly deprecated
  970. %dx.types.CBufRet.f32 = type { float, float, float, float }
  971. declare %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(
  972. i32, ; opcode
  973. %dx.types.Handle, ; resource handle
  974. i32) ; 0-based row index (row = 16-byte DXBC register)
  975. Valid resource types: ConstantBuffer. Valid shader model: SM5.1 and earlier.
  976. The operation loads four 32-bit values from a constant buffer, which has legacy, 16-byte layout. Values are extracted via "extractvalue" instruction; unused values may be optimized away by the driver compiler. The operation respects SM5.1 and earlier OOB behavior for cbuffers.
  977. CBufferLoad
  978. ~~~~~~~~~~~
  979. The following signature shows the operation syntax::
  980. ; overloads: SM5.1: f32|i32|f64, SM6.0: f16|f32|f64|i16|i32|i64
  981. declare float @dx.op.cbufferLoad.f32(
  982. i32, ; opcode
  983. %dx.types.Handle, ; resource handle
  984. i32, ; byte offset from the start of the buffer memory
  985. i32) ; read alignment
  986. Valid resource types: ConstantBuffer.
  987. The operation loads a value from a constant buffer, which has linear layout, using 1D index: byte offset from the beginning of the buffer memory. The operation respects SM5.1 and earlier OOB behavior for cbuffers.
  988. Read alignment is a constant value identifying what the byte offset alignment is. If the actual byte offset does not have this alignment, the results of this operation are undefined.
  989. GetDimensions
  990. ~~~~~~~~~~~~~
  991. The following signature shows the operation syntax::
  992. declare %dx.types.Dimensions @dx.op.getDimensions(
  993. i32, ; opcode
  994. %dx.types.Handle, ; resource handle
  995. i32) ; MIP level
  996. This table describes the return component meanings for each resource type { c0, c1, c2, c3 }.
  997. ==================== ===== ========== ========== ==========
  998. Valid resource types c0 c1 c2 c3
  999. ==================== ===== ========== ========== ==========
  1000. [RW]Texture1D width undef undef MIP levels
  1001. [RW]Texture1DArray width array size undef MIP levels
  1002. [RW]Texture2D width height undef MIP levels
  1003. [RW]Texture2DArray width height array size MIP levels
  1004. [RW]Texture3D width height depth MIP levels
  1005. [RW]Texture2DMS width height undef samples
  1006. [RW]Texture2DMSArray width height array size samples
  1007. TextureCUBE width height undef MIP levels
  1008. TextureCUBEArray width height array size MIP levels
  1009. [RW]TypedBuffer width undef undef undef
  1010. [RW]RawBuffer width undef undef undef
  1011. [RW]StructuredBuffer width undef undef undef
  1012. ==================== ===== ========== ========== ==========
  1013. MIP levels is always undef for RW resources. Undef means the component will not be used. The validator will verify this.
  1014. There is no GetDimensions that returns float values.
  1015. Sample
  1016. ~~~~~~
  1017. The following signature shows the operation syntax::
  1018. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1019. declare %dx.types.ResRet.f32 @dx.op.sample.f32(
  1020. i32, ; opcode
  1021. %dx.types.Handle, ; texture handle
  1022. %dx.types.Handle, ; sampler handle
  1023. float, ; coordinate c0
  1024. float, ; coordinate c1
  1025. float, ; coordinate c2
  1026. float, ; coordinate c3
  1027. i32, ; offset o0
  1028. i32, ; offset o1
  1029. i32, ; offset o2
  1030. float) ; clamp
  1031. =================== ================================ ===================
  1032. Valid resource type # of active coordinates # of active offsets
  1033. =================== ================================ ===================
  1034. Texture1D 1 (c0) 1 (o0)
  1035. Texture1DArray 2 (c0, c1 = array slice) 1 (o0)
  1036. Texture2D 2 (c0, c1) 2 (o0, o1)
  1037. Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
  1038. Texture3D 3 (c0, c1, c2) 3 (o0, o1, o2)
  1039. TextureCUBE 3 (c0, c1, c2) 3 (o0, o1, o2)
  1040. TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 3 (o0, o1, o2)
  1041. =================== ================================ ===================
  1042. SampleBias
  1043. ~~~~~~~~~~
  1044. The following signature shows the operation syntax::
  1045. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1046. declare %dx.types.ResRet.f32 @dx.op.sampleBias.f32(
  1047. i32, ; opcode
  1048. %dx.types.Handle, ; texture handle
  1049. %dx.types.Handle, ; sampler handle
  1050. float, ; coordinate c0
  1051. float, ; coordinate c1
  1052. float, ; coordinate c2
  1053. float, ; coordinate c3
  1054. i32, ; offset o0
  1055. i32, ; offset o1
  1056. i32, ; offset o2
  1057. float, ; bias: in [-16.f,15.99f]
  1058. float) ; clamp
  1059. Valid resource types and active components/offsets are the same as for the sample operation.
  1060. SampleLevel
  1061. ~~~~~~~~~~~
  1062. The following signature shows the operation syntax::
  1063. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1064. declare %dx.types.ResRet.f32 @dx.op.sampleLevel.f32(
  1065. i32, ; opcode
  1066. %dx.types.Handle, ; texture handle
  1067. %dx.types.Handle, ; sampler handle
  1068. float, ; coordinate c0
  1069. float, ; coordinate c1
  1070. float, ; coordinate c2
  1071. float, ; coordinate c3
  1072. i32, ; offset o0
  1073. i32, ; offset o1
  1074. i32, ; offset o2
  1075. float) ; LOD
  1076. Valid resource types and active components/offsets are the same as for the sample operation.
  1077. SampleGrad
  1078. ~~~~~~~~~~
  1079. The following signature shows the operation syntax::
  1080. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1081. declare %dx.types.ResRet.f32 @dx.op.sampleGrad.f32(
  1082. i32, ; opcode
  1083. %dx.types.Handle, ; texture handle
  1084. %dx.types.Handle, ; sampler handle
  1085. float, ; coordinate c0
  1086. float, ; coordinate c1
  1087. float, ; coordinate c2
  1088. float, ; coordinate c3
  1089. i32, ; offset o0
  1090. i32, ; offset o1
  1091. i32, ; offset o2
  1092. float, ; ddx0
  1093. float, ; ddx1
  1094. float, ; ddx2
  1095. float, ; ddy0
  1096. float, ; ddy1
  1097. float, ; ddy2
  1098. float) ; clamp
  1099. Valid resource types and active components and offsets are the same as for the sample operation. Valid active ddx and ddy are the same as offsets.
  1100. SampleCmp
  1101. ~~~~~~~~~
  1102. The following signature shows the operation syntax::
  1103. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1104. declare %dx.types.ResRet.f32 @dx.op.sampleCmp.f32(
  1105. i32, ; opcode
  1106. %dx.types.Handle, ; texture handle
  1107. %dx.types.Handle, ; sampler handle
  1108. float, ; coordinate c0
  1109. float, ; coordinate c1
  1110. float, ; coordinate c2
  1111. float, ; coordinate c3
  1112. i32, ; offset o0
  1113. i32, ; offset o1
  1114. i32, ; offset o2
  1115. float, ; compare value
  1116. float) ; clamp
  1117. =================== ================================ ===================
  1118. Valid resource type # of active coordinates # of active offsets
  1119. =================== ================================ ===================
  1120. Texture1D 1 (c0) 1 (o0)
  1121. Texture1DArray 2 (c0, c1 = array slice) 1 (o0)
  1122. Texture2D 2 (c0, c1) 2 (o0, o1)
  1123. Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
  1124. TextureCUBE 3 (c0, c1, c2) 3 (o0, o1, o2)
  1125. TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 3 (o0, o1, o2)
  1126. =================== ================================ ===================
  1127. SampleCmpLevelZero
  1128. ~~~~~~~~~~~~~~~~~~
  1129. The following signature shows the operation syntax::
  1130. ; overloads: SM5.1: f32, SM6.0: f16|f32
  1131. declare %dx.types.ResRet.f32 @dx.op.sampleCmpLevelZero.f32(
  1132. i32, ; opcode
  1133. %dx.types.Handle, ; texture handle
  1134. %dx.types.Handle, ; sampler handle
  1135. float, ; coordinate c0
  1136. float, ; coordinate c1
  1137. float, ; coordinate c2
  1138. float, ; coordinate c3
  1139. i32, ; offset o0
  1140. i32, ; offset o1
  1141. i32, ; offset o2
  1142. float) ; compare value
  1143. Valid resource types and active components/offsets are the same as for the sampleCmp operation.
  1144. TextureLoad
  1145. ~~~~~~~~~~~
  1146. The following signature shows the operation syntax::
  1147. ; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
  1148. declare %dx.types.ResRet.f32 @dx.op.textureLoad.f32(
  1149. i32, ; opcode
  1150. %dx.types.Handle, ; texture handle
  1151. i32, ; MIP level; sample for Texture2DMS
  1152. i32, ; coordinate c0
  1153. i32, ; coordinate c1
  1154. i32, ; coordinate c2
  1155. i32, ; offset o0
  1156. i32, ; offset o1
  1157. i32) ; offset o2
  1158. =================== ========= ============================ ===================
  1159. Valid resource type MIP level # of active coordinates # of active offsets
  1160. =================== ========= ============================ ===================
  1161. Texture1D yes 1 (c0) 1 (o0)
  1162. RWTexture1D undef 1 (c0) undef
  1163. Texture1DArray yes 2 (c0, c1 = array slice) 1 (o0)
  1164. RWTexture1DArray undef 2 (c0, c1 = array slice) undef
  1165. Texture2D yes 2 (c0, c1) 2 (o0, o1)
  1166. RWTexture2D undef 2 (c0, c1) undef
  1167. Texture2DArray yes 3 (c0, c1, c2 = array slice) 2 (o0, o1)
  1168. RWTexture2DArray undef 3 (c0, c1, c2 = array slice) undef
  1169. Texture3D yes 3 (c0, c1, c2) 3 (o0, o1, o2)
  1170. RWTexture3D undef 3 (c0, c1, c2) undef
  1171. =================== ========= ============================ ===================
  1172. For Texture2DMS:
  1173. =================== ============ =================================
  1174. Valid resource type Sample index # of active coordinate components
  1175. =================== ============ =================================
  1176. Texture2DMS yes 2 (c0, c1)
  1177. Texture2DMSArray yes 3 (c0, c1, c2 = array slice)
  1178. =================== ============ =================================
  1179. TextureStore
  1180. ~~~~~~~~~~~~
  1181. The following signature shows the operation syntax::
  1182. ; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
  1183. ; returns: status
  1184. declare void @dx.op.textureStore.f32(
  1185. i32, ; opcode
  1186. %dx.types.Handle, ; texture handle
  1187. i32, ; coordinate c0
  1188. i32, ; coordinate c1
  1189. i32, ; coordinate c2
  1190. float, ; value v0
  1191. float, ; value v1
  1192. float, ; value v2
  1193. float, ; value v3
  1194. i8) ; write mask
  1195. The write mask indicates which components are written (x - 1, y - 2, z - 4, w - 8), similar to DXBC. The mask must cover all resource components.
  1196. =================== =================================
  1197. Valid resource type # of active coordinate components
  1198. =================== =================================
  1199. RWTexture1D 1 (c0)
  1200. RWTexture1DArray 2 (c0, c1 = array slice)
  1201. RWTexture2D 2 (c0, c1)
  1202. RWTexture2DArray 3 (c0, c1, c2 = array slice)
  1203. RWTexture3D 3 (c0, c1, c2)
  1204. =================== =================================
  1205. CalculateLOD
  1206. ~~~~~~~~~~~~
  1207. The following signature shows the operation syntax::
  1208. ; returns: LOD
  1209. declare float @dx.op.calculateLOD.f32(
  1210. i32, ; opcode
  1211. %dx.types.Handle, ; texture handle
  1212. %dx.types.Handle, ; sampler handle
  1213. float, ; coordinate c0, [0.0, 1.0]
  1214. float, ; coordinate c1, [0.0, 1.0]
  1215. float, ; coordinate c2, [0.0, 1.0]
  1216. i1) ; true - clamped; false - unclamped
  1217. ============================= =======================
  1218. Valid resource type # of active coordinates
  1219. ============================= =======================
  1220. Texture1D, Texture1DArray 1 (c0)
  1221. Texture2D, Texture2DArray 2 (c0, c1)
  1222. Texture3D 3 (c0, c1, c2)
  1223. TextureCUBE, TextureCUBEArray 3 (c0, c1, c2)
  1224. ============================= =======================
  1225. TextureGather
  1226. ~~~~~~~~~~~~~
  1227. The following signature shows the operation syntax::
  1228. ; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
  1229. declare %dx.types.ResRet.f32 @dx.op.textureGather.f32(
  1230. i32, ; opcode
  1231. %dx.types.Handle, ; texture handle
  1232. %dx.types.Handle, ; sampler handle
  1233. float, ; coordinate c0
  1234. float, ; coordinate c1
  1235. float, ; coordinate c2
  1236. float, ; coordinate c3
  1237. i32, ; offset o0
  1238. i32, ; offset o1
  1239. i32) ; channel, constant in {0=red,1=green,2=blue,3=alpha}
  1240. =================== ================================ ===================
  1241. Valid resource type # of active coordinates # of active offsets
  1242. =================== ================================ ===================
  1243. Texture2D 2 (c0, c1) 2 (o0, o1)
  1244. Texture2DArray 3 (c0, c1, c2 = array slice) 2 (o0, o1)
  1245. TextureCUBE 3 (c0, c1, c2) 0
  1246. TextureCUBEArray 4 (c0, c1, c2, c3 = array slice) 0
  1247. =================== ================================ ===================
  1248. TextureGatherCmp
  1249. ~~~~~~~~~~~~~~~~
  1250. The following signature shows the operation syntax::
  1251. ; overloads: SM5.1: f32|i32, SM6.0: f16|f32|i16|i32
  1252. declare %dx.types.ResRet.f32 @dx.op.textureGatherCmp.f32(
  1253. i32, ; opcode
  1254. %dx.types.Handle, ; texture handle
  1255. %dx.types.Handle, ; sampler handle
  1256. float, ; coordinate c0
  1257. float, ; coordinate c1
  1258. float, ; coordinate c2
  1259. float, ; coordinate c3
  1260. i32, ; offset o0
  1261. i32, ; offset o1
  1262. i32, ; channel, constant in {0=red,1=green,2=blue,3=alpha}
  1263. float) ; compare value
  1264. Valid resource types and active components/offsets are the same as for the textureGather operation.
  1265. Texture2DMSGetSamplePosition
  1266. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1267. The following signature shows the operation syntax::
  1268. declare %dx.types.SamplePos @dx.op.texture2DMSGetSamplePosition(
  1269. i32, ; opcode
  1270. %dx.types.Handle, ; texture handle
  1271. i32) ; sample ID
  1272. Returns sample position of a texture.
  1273. RenderTargetGetSamplePosition
  1274. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1275. The following signature shows the operation syntax::
  1276. declare %dx.types.SamplePos @dx.op.renderTargetGetSamplePosition(
  1277. i32, ; opcode
  1278. i32) ; sample ID
  1279. Returns sample position of a render target.
  1280. RenderTargetGetSampleCount
  1281. ~~~~~~~~~~~~~~~~~~~~~~~~~~
  1282. The following signature shows the operation syntax::
  1283. declare i32 @dx.op.renderTargetGetSampleCount(
  1284. i32) ; opcode
  1285. Returns sample count of a render target.
  1286. BufferLoad
  1287. ~~~~~~~~~~
  1288. The following signature shows the operation syntax::
  1289. ; overloads: SM5.1: f32|i32, SM6.0: f32|i32
  1290. ; returns: status
  1291. declare %dx.types.ResRet.f32 @dx.op.bufferLoad.f32(
  1292. i32, ; opcode
  1293. %dx.types.Handle, ; resource handle
  1294. i32, ; coordinate c0
  1295. i32) ; coordinate c1
  1296. The call respects SM5.1 OOB and alignment rules.
  1297. ==================== =====================================================
  1298. Valid resource type # of active coordinates
  1299. ==================== =====================================================
  1300. [RW]TypedBuffer 1 (c0 in elements)
  1301. [RW]RawBuffer 1 (c0 in bytes)
  1302. [RW]StructuredBuffer 2 (c0 in elements, c1 = byte offset into the element)
  1303. ==================== =====================================================
  1304. RawBufferLoad
  1305. ~~~~~~~~~~~~~
  1306. The following signature shows the operation syntax::
  1307. ; overloads: SM5.1: f32|i32, SM6.0: f32|i32, SM6.2: f16|f32|i16|i32
  1308. ; returns: status
  1309. declare %dx.types.ResRet.f32 @dx.op.rawBufferLoad.f32(
  1310. i32, ; opcode
  1311. %dx.types.Handle, ; resource handle
  1312. i32, ; coordinate c0 (index)
  1313. i32, ; coordinate c1 (elementOffset)
  1314. i8, ; mask
  1315. i32, ; alignment
  1316. )
  1317. The call respects SM5.1 OOB and alignment rules.
  1318. ==================== =====================================================
  1319. Valid resource type # of active coordinates
  1320. ==================== =====================================================
  1321. [RW]RawBuffer 1 (c0 in bytes)
  1322. [RW]StructuredBuffer 2 (c0 in elements, c1 = byte offset into the element)
  1323. ==================== =====================================================
  1324. BufferStore
  1325. ~~~~~~~~~~~
  1326. The following signature shows the operation syntax::
  1327. ; overloads: SM5.1: f32|i32, SM6.0: f32|i32
  1328. declare void @dx.op.bufferStore.f32(
  1329. i32, ; opcode
  1330. %dx.types.Handle, ; resource handle
  1331. i32, ; coordinate c0
  1332. i32, ; coordinate c1
  1333. float, ; value v0
  1334. float, ; value v1
  1335. float, ; value v2
  1336. float, ; value v3
  1337. i8) ; write mask
  1338. The call respects SM5.1 OOB and alignment rules.
  1339. The write mask indicates which components are written (x - 1, y - 2, z - 4, w - 8), similar to DXBC. For RWTypedBuffer, the mask must cover all resource components. For RWRawBuffer and RWStructuredBuffer, valid masks are: x, xy, xyz, xyzw.
  1340. =================== =====================================================
  1341. Valid resource type # of active coordinates
  1342. =================== =====================================================
  1343. RWTypedBuffer 1 (c0 in elements)
  1344. RWRawBuffer 1 (c0 in bytes)
  1345. RWStructuredBuffer 2 (c0 in elements, c1 = byte offset into the element)
  1346. =================== =====================================================
  1347. RawBufferStore
  1348. ~~~~~~~~~~~~~~
  1349. The following signature shows the operation syntax::
  1350. ; overloads: SM5.1: f32|i32, SM6.0: f32|i32, SM6.2: f16|f32|i16|i32
  1351. declare void @dx.op.rawBufferStore.f32(
  1352. i32, ; opcode
  1353. %dx.types.Handle, ; resource handle
  1354. i32, ; coordinate c0 (index)
  1355. i32, ; coordinate c1 (elementOffset)
  1356. float, ; value v0
  1357. float, ; value v1
  1358. float, ; value v2
  1359. float, ; value v3
  1360. i8, ; write mask
  1361. i32) ; alignment
  1362. The call respects SM5.1 OOB and alignment rules.
  1363. The write mask indicates which components are written (x - 1, y - 2, z - 4, w - 8), similar to DXBC. For RWTypedBuffer, the mask must cover all resource components. For RWRawBuffer and RWStructuredBuffer, valid masks are: x, xy, xyz, xyzw.
  1364. ==================== =====================================================
  1365. Valid resource type # of active coordinates
  1366. ==================== =====================================================
  1367. RWRawbuffer 1 (c0 in bytes)
  1368. RWStructuredbuffer 2 (c0 in elements, c1 = byte offset into the element)
  1369. ==================== =====================================================
  1370. BufferUpdateCounter
  1371. ~~~~~~~~~~~~~~~~~~~
  1372. The following signature shows the operation syntax::
  1373. ; opcodes: bufferUpdateCounter
  1374. declare void @dx.op.bufferUpdateCounter(
  1375. i32, ; opcode
  1376. %dx.types.ResHandle, ; buffer handle
  1377. i8) ; 1 - increment, -1 - decrement
  1378. Valid resource type: RWRawBuffer.
  1379. AtomicBinOp
  1380. ~~~~~~~~~~~
  1381. The following signature shows the operation syntax::
  1382. ; overloads: SM5.1: i32, SM6.0: i32
  1383. ; returns: original value in memory before the operation
  1384. declare i32 @dx.op.atomicBinOp.i32(
  1385. i32, ; opcode
  1386. %dx.types.Handle, ; resource handle
  1387. i32, ; binary operation code: EXCHANGE, IADD, AND, OR, XOR, IMIN, IMAX, UMIN, UMAX
  1388. i32, ; coordinate c0
  1389. i32, ; coordinate c1
  1390. i32, ; coordinate c2
  1391. i32) ; new value
  1392. The call respects SM5.1 OOB and alignment rules.
  1393. =================== =====================================================
  1394. Valid resource type # of active coordinates
  1395. =================== =====================================================
  1396. RWTexture1D 1 (c0)
  1397. RWTexture1DArray 2 (c0, c1 = array slice)
  1398. RWTexture2D 2 (c0, c1)
  1399. RWTexture2DArray 3 (c0, c1, c2 = array slice)
  1400. RWTexture3D 3 (c0, c1, c2)
  1401. RWTypedBuffer 1 (c0 in elements)
  1402. RWRawBuffer 1 (c0 in bytes)
  1403. RWStructuredBuffer 2 (c0 in elements, c1 - byte offset into the element)
  1404. =================== =====================================================
  1405. AtomicBinOp subsumes corresponding DXBC atomic operations that do not return the old value in memory. The driver compiler is free to specialize the corresponding GPU instruction if the return value is unused.
  1406. AtomicCompareExchange
  1407. ~~~~~~~~~~~~~~~~~~~~~
  1408. The following signature shows the operation syntax::
  1409. ; overloads: SM5.1: i32, SM6.0: i32
  1410. ; returns: original value in memory before the operation
  1411. declare i32 @dx.op.atomicCompareExchange.i32(
  1412. i32, ; opcode
  1413. %dx.types.Handle, ; resource handle
  1414. i32, ; coordinate c0
  1415. i32, ; coordinate c1
  1416. i32, ; coordinate c2
  1417. i32, ; comparison value
  1418. i32) ; new value
  1419. The call respects SM5.1 OOB and alignment rules.
  1420. =================== =====================================================
  1421. Valid resource type # of active coordinates
  1422. =================== =====================================================
  1423. RWTexture1D 1 (c0)
  1424. RWTexture1DArray 2 (c0, c1 = array slice)
  1425. RWTexture2D 2 (c0, c1)
  1426. RWTexture2DArray 3 (c0, c1, c2 = array slice)
  1427. RWTexture3D 3 (c0, c1, c2)
  1428. RWTypedBuffer 1 (c0 in elements)
  1429. RWRawBuffer 1 (c0 in bytes)
  1430. RWStructuredBuffer 2 (c0 in elements, c1 - byte offset into the element)
  1431. =================== =====================================================
  1432. AtomicCompareExchange subsumes DXBC's atomic compare store. The driver compiler is free to specialize the corresponding GPU instruction if the return value is unused.
  1433. GetBufferBasePtr (SM6.0)
  1434. ~~~~~~~~~~~~~~~~~~~~~~~~
  1435. The following signature shows the operation syntax::
  1436. Returns i8* pointer to the base of [RW]RawBuffer instance.
  1437. declare i8 addrspace(ASmemory) * @dx.op.getBufferBasePtr.pASmemory (
  1438. i32, ; opcode
  1439. %dx.types.Handle) ; resource handle
  1440. Returns i8* pointer to the base of ConstantBuffer instance.
  1441. declare i8 addrspace(AScbuffer) * @dx.op.getBufferBasePtr.pAScbuffer(
  1442. i32, ; opcode
  1443. %dx.types.Handle) ; resource handle
  1444. Given SM5.1 resource handle, return base pointer to perform pointer-based accesses to the resource memory.
  1445. Note: the functionality is requested for SM6.0 to support pointer-based accesses to SM5.1 resources with raw linear memory (raw buffer and cbuffer) in HLSL next. This would be one of the way how a valid pointer is produced in the shader, and would let new-style, pointer-based code access SM5.1 resources with linear memory view.
  1446. Atomic operations via pointer
  1447. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1448. Groupshared memory atomic operations are done via LLVM atomic instructions atomicrmw and cmpxchg. The instructions accept only i32 addrspace(ASgs) * pointers, where ASgs is the addrspace number of groupshared variables. Atomicrmw instruction does not support 'sub' and 'nand' operations. These constraints may be revisited in the future. OOB behavior is undefined.
  1449. SM6.0 will enable similar mechanism for atomic operations performed on device memory (raw buffer).
  1450. Samplers
  1451. --------
  1452. There are no intrinsics for samplers. Sampler reflection data is represented similar to other resources.
  1453. Immediate Constant Buffer
  1454. -------------------------
  1455. There is no immediate constant buffer in DXIL. Instead, indexable constants are represented via LLVM global initialized constants in address space ASicb.
  1456. Texture Buffers
  1457. ---------------
  1458. A texture buffer is mapped to RawBuffer. Texture buffer variable declarations are present for reflection purposes only.
  1459. Groupshared memory
  1460. ------------------
  1461. Groupshared memory (DXBC g-registers) is linear in DXIL. Groupshared variables are declared via global variables in addrspace(ASgs). The optimizer will not group variables; the driver compiler can do this if desired. Accesses to groupshared variables occur via pointer load/store instructions (see below).
  1462. Indexable threadlocal memory
  1463. ----------------------------
  1464. Indexable threadlocal memory (DXBC x-registers) is linear in DXIL. Threadlocal variables are "declared" via alloca instructions. Threadlocal variables are assumed to reside in addrspace(0). The variables are not allocated into some memory pool; the driver compiler can do this, if desired. Accesses to threadlocal variables occur via pointer load/store instructions (see below).
  1465. Load/Store/Atomics via pointer in future SM
  1466. -------------------------------------------
  1467. HLSL offers several abstractions with linear memory: buffers, cbuffers, groupshared and indexable threadlocal memory, that are conceptually similar, but have different HLSL syntax and some differences in behavior, which are exposed to HLSL developers. The plan is to introduce pointers into HLSL to unify access syntax to such linear-memory resources such that they appear conceptually the same to HLSL programmers.
  1468. Each resource memory type is expressed by a unique LLVM address space. The following table shows memory types and their address spaces:
  1469. ========================================= =====================================
  1470. Memory type Address space number n - addrspace(n)
  1471. ========================================= =====================================
  1472. code, local, indexable threadlocal memory AS_default = 0
  1473. device memory ([RW]RawBuffer) AS_memory = 1
  1474. cbuffer-like memory (ConstantBuffer) AS_cbuffer = 2
  1475. groupshared memory AS_groupshared = 3
  1476. ========================================= =====================================
  1477. Pointers can be produced in the shader in a variety of ways (see Memory accesses section). Note that if GetBaseBufferPtr was used on [RW]RawBuffer or ConstantBuffer to produce a pointer, the base pointer is stateless; i.e., it "loses its connection" to the underlying resource and is treated as a stateless pointer into a particular memory type.
  1478. Additional resource properties
  1479. ------------------------------
  1480. TODO: enumerate all additional resource range properties, e.g., ROV, Texture2DMS, globally coherent, UAV counter, sampler mode, CB: immediate/dynamic indexed.
  1481. Operations
  1482. ==========
  1483. DXIL operations are represented in two ways: using LLVM instructions and using LLVM external functions. The reference list of operations as well as their overloads can be found in the attached Excel spreadsheet "DXIL Operations".
  1484. Operations via instructions
  1485. ---------------------------
  1486. DXIL uses a subset of core LLVM IR instructions that make sense for HLSL, where the meaning of the LLVM IR operation matches the meaning of the HLSL operation.
  1487. The following LLVM instructions are valid in a DXIL program, with the specified operand types where applicable. The legend for overload types (v)oid, (h)alf, (f)loat, (d)ouble, (1)-bit, (8)-bit, (w)ord, (i)nt, (l)ong.
  1488. .. <py>import hctdb_instrhelp</py>
  1489. .. <py::lines('INSTR-RST')>hctdb_instrhelp.get_instrs_rst()</py>
  1490. .. INSTR-RST:BEGIN
  1491. ============= ======================================================================= =================
  1492. Instruction Action Operand overloads
  1493. ============= ======================================================================= =================
  1494. Ret returns a value (possibly void), from a function. vhfd1wil
  1495. Br branches (conditional or unconditional)
  1496. Switch performs a multiway switch
  1497. Add returns the sum of its two operands wil
  1498. FAdd returns the sum of its two operands hfd
  1499. Sub returns the difference of its two operands wil
  1500. FSub returns the difference of its two operands hfd
  1501. Mul returns the product of its two operands wil
  1502. FMul returns the product of its two operands hfd
  1503. UDiv returns the quotient of its two unsigned operands wil
  1504. SDiv returns the quotient of its two signed operands wil
  1505. FDiv returns the quotient of its two operands hfd
  1506. URem returns the remainder from the unsigned division of its two operands wil
  1507. SRem returns the remainder from the signed division of its two operands wil
  1508. FRem returns the remainder from the division of its two operands hfd
  1509. Shl shifts left (logical) wil
  1510. LShr shifts right (logical), with zero bit fill wil
  1511. AShr shifts right (arithmetic), with 'a' operand sign bit fill wil
  1512. And returns a bitwise logical and of its two operands 1wil
  1513. Or returns a bitwise logical or of its two operands 1wil
  1514. Xor returns a bitwise logical xor of its two operands 1wil
  1515. Alloca allocates memory on the stack frame of the currently executing function
  1516. Load reads from memory
  1517. Store writes to memory
  1518. GetElementPtr gets the address of a subelement of an aggregate value
  1519. AtomicCmpXchg atomically modifies memory
  1520. AtomicRMW atomically modifies memory
  1521. Trunc truncates an integer 1wil
  1522. ZExt zero extends an integer 1wil
  1523. SExt sign extends an integer 1wil
  1524. FPToUI converts a floating point to UInt hfd1wil
  1525. FPToSI converts a floating point to SInt hfd1wil
  1526. UIToFP converts a UInt to floating point hfd1wil
  1527. SIToFP converts a SInt to floating point hfd1wil
  1528. FPTrunc truncates a floating point hfd
  1529. FPExt extends a floating point hfd
  1530. BitCast performs a bit-preserving type cast hfd1wil
  1531. AddrSpaceCast casts a value addrspace
  1532. ICmp compares integers 1wil
  1533. FCmp compares floating points hfd
  1534. PHI is a PHI node instruction
  1535. Call calls a function
  1536. Select selects an instruction
  1537. ExtractValue extracts from aggregate
  1538. ============= ======================================================================= =================
  1539. FAdd
  1540. ~~~~
  1541. %des = fadd float %src0, %src1
  1542. The following table shows the results obtained when executing the instruction with various classes of numbers, assuming that "fp32-denorm-mode"="preserve".
  1543. For "fp32-denorm-mode"="ftz" mode, denorms inputs should be treated as corresponding signed zero, and any resulting denorm is also flushed to zero.
  1544. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1545. | src0\src1| -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  1546. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1547. | -inf | -inf | -inf | -inf |-inf|-inf| -inf | -inf | NaN | NaN |
  1548. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1549. | -F | -inf | -F | -F |src0|src0| -F | +/-F | +inf | NaN |
  1550. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1551. | -denorm | -inf | -F |-F/denorm |src0|src0| +/-denorm | +F | +inf | NaN |
  1552. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1553. | -0 | -inf | src1 | src1 |-0 |+0 | src1 | src1 | +inf | NaN |
  1554. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1555. | +0 | -inf | src1 | src1 |-0 |+0 | src1 | src1 | +inf | NaN |
  1556. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1557. | +denorm | -inf | -F |+/-denorm |src0|src0| +F/denorm | +F | +inf | NaN |
  1558. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1559. | +F | -inf | +/-F | +F |src0|src0| +F | +F | +inf | NaN |
  1560. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1561. | +inf | NaN | +inf | +inf |+inf|+inf| +inf | +inf | +inf | NaN |
  1562. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1563. | NaN | NaN | NaN | NaN |NaN |NaN | NaN | NaN | NaN | NaN |
  1564. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  1565. FDiv
  1566. ~~~~
  1567. %dest = fdiv float %src0, %src1
  1568. The following table shows the results obtained when executing the instruction with various classes of numbers, assuming that fast math flag is not used and "fp32-denorm-mode"="preserve".
  1569. When "fp32-denorm-mode"="ftz", denorm inputs should be interpreted as corresponding signed zero, and any resulting denorm is also flushed to zero.
  1570. When fast math is enabled, implementation may use reciprocal form: src0*(1/src1). This may result in evaluating src0*(+/-)INF from src0*(1/(+/-)denorm). This may produce NaN in some cases or (+/-)INF in others.
  1571. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1572. | src0\\src1| -inf | -F | -1 | -denorm | -0 | +0 | +denorm | +1 | +F | +inf | NaN |
  1573. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1574. | -inf | NaN | +inf | +inf | +inf |+inf|-inf| -inf | -inf | -inf | NaN | NaN |
  1575. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1576. | -F | +0 | +F | -src0 | +F |+inf|-inf| -F | src0 | -F | -0 | NaN |
  1577. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1578. | -denorm | +0 | +denorm| -src0 | +F |+inf|-inf| -F | src0 |-denorm | -0 | NaN |
  1579. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1580. | -0 | +0 | +0 | +0 | 0 |NaN |NaN | 0 | -0 | -0 | -0 | NaN |
  1581. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1582. | +0 | -0 | -0 | -0 | 0 |NaN |NaN | 0 | +0 | +0 | +0 | NaN |
  1583. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1584. | +denorm | -0 | -denorm| -src0 | -F |-inf|+inf| +F | src0 |+denorm | +0 | NaN |
  1585. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1586. | +F | -0 | -F | -src0 | -F |-inf|+inf| +F | src0 | +F | +0 | NaN |
  1587. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1588. | +inf | NaN | -inf | -inf | -inf |-inf|+inf| +inf | +inf | +inf | NaN | NaN |
  1589. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1590. | NaN | NaN | NaN | NaN | NaN |NaN |NaN | NaN | NaN | NaN | NaN | NaN |
  1591. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  1592. .. INSTR-RST:END
  1593. Operations via external functions
  1594. ---------------------------------
  1595. Operations missing in core LLVM IR, such as abs, fma, discard, etc., are represented by external functions, whose name is prefixed with dx.op.
  1596. The very first parameter of each such external function is the opcode of the operation, which is an i32 constant. For example, dx.op.unary computes a unary function T res = opcode(T input). Opcode defines which unary function to perform.
  1597. Opcodes are defined on a dense range and will be provided as enum in a header file. The opcode parameter is introduced for efficiency reasons: grouping of operations to reduce the total number of overloads and more efficient property lookup, e.g., via an array of operation properties rather than a hash table.
  1598. .. <py::lines('OPCODES-RST')>hctdb_instrhelp.get_opcodes_rst()</py>
  1599. .. OPCODES-RST:BEGIN
  1600. === ===================================================== =======================================================================================================================================================================================================================
  1601. ID Name Description
  1602. === ===================================================== =======================================================================================================================================================================================================================
  1603. 0 TempRegLoad_ Helper load operation
  1604. 1 TempRegStore_ Helper store operation
  1605. 2 MinPrecXRegLoad_ Helper load operation for minprecision
  1606. 3 MinPrecXRegStore_ Helper store operation for minprecision
  1607. 4 LoadInput_ Loads the value from shader input
  1608. 5 StoreOutput_ Stores the value to shader output
  1609. 6 FAbs_ returns the absolute value of the input value.
  1610. 7 Saturate_ clamps the result of a single or double precision floating point value to [0.0f...1.0f]
  1611. 8 IsNaN_ Returns true if x is NAN or QNAN, false otherwise.
  1612. 9 IsInf_ Returns true if x is +INF or -INF, false otherwise.
  1613. 10 IsFinite_ Returns true if x is finite, false otherwise.
  1614. 11 IsNormal_ returns IsNormal
  1615. 12 Cos_ returns cosine(theta) for theta in radians.
  1616. 13 Sin_ returns sine(theta) for theta in radians.
  1617. 14 Tan_ returns tan(theta) for theta in radians.
  1618. 15 Acos_ Returns the arccosine of the specified value. Input should be a floating-point value within the range of -1 to 1.
  1619. 16 Asin_ Returns the arccosine of the specified value. Input should be a floating-point value within the range of -1 to 1
  1620. 17 Atan_ Returns the arctangent of the specified value. The return value is within the range of -PI/2 to PI/2.
  1621. 18 Hcos_ returns the hyperbolic cosine of the specified value.
  1622. 19 Hsin_ returns the hyperbolic sine of the specified value.
  1623. 20 Htan_ returns the hyperbolic tangent of the specified value.
  1624. 21 Exp_ returns 2^exponent
  1625. 22 Frc_ extract fracitonal component.
  1626. 23 Log_ returns log base 2.
  1627. 24 Sqrt_ returns square root
  1628. 25 Rsqrt_ returns reciprocal square root (1 / sqrt(src)
  1629. 26 Round_ne_ floating-point round to integral float.
  1630. 27 Round_ni_ floating-point round to integral float.
  1631. 28 Round_pi_ floating-point round to integral float.
  1632. 29 Round_z_ floating-point round to integral float.
  1633. 30 Bfrev_ Reverses the order of the bits.
  1634. 31 Countbits_ Counts the number of bits in the input integer.
  1635. 32 FirstbitLo_ Returns the location of the first set bit starting from the lowest order bit and working upward.
  1636. 33 FirstbitHi_ Returns the location of the first set bit starting from the highest order bit and working downward.
  1637. 34 FirstbitSHi_ Returns the location of the first set bit from the highest order bit based on the sign.
  1638. 35 FMax_ returns a if a >= b, else b
  1639. 36 FMin_ returns a if a < b, else b
  1640. 37 IMax_ IMax(a,b) returns a if a > b, else b
  1641. 38 IMin_ IMin(a,b) returns a if a < b, else b
  1642. 39 UMax_ unsigned integer maximum. UMax(a,b) = a > b ? a : b
  1643. 40 UMin_ unsigned integer minimum. UMin(a,b) = a < b ? a : b
  1644. 41 IMul_ multiply of 32-bit operands to produce the correct full 64-bit result.
  1645. 42 UMul_ multiply of 32-bit operands to produce the correct full 64-bit result.
  1646. 43 UDiv_ unsigned divide of the 32-bit operand src0 by the 32-bit operand src1.
  1647. 44 UAddc_ unsigned add of 32-bit operand with the carry
  1648. 45 USubb_ unsigned subtract of 32-bit operands with the borrow
  1649. 46 FMad_ floating point multiply & add
  1650. 47 Fma_ fused multiply-add
  1651. 48 IMad_ Signed integer multiply & add
  1652. 49 UMad_ Unsigned integer multiply & add
  1653. 50 Msad_ masked Sum of Absolute Differences.
  1654. 51 Ibfe_ Integer bitfield extract
  1655. 52 Ubfe_ Unsigned integer bitfield extract
  1656. 53 Bfi_ Given a bit range from the LSB of a number, places that number of bits in another number at any offset
  1657. 54 Dot2_ Two-dimensional vector dot-product
  1658. 55 Dot3_ Three-dimensional vector dot-product
  1659. 56 Dot4_ Four-dimensional vector dot-product
  1660. 57 CreateHandle creates the handle to a resource
  1661. 58 CBufferLoad loads a value from a constant buffer resource
  1662. 59 CBufferLoadLegacy loads a value from a constant buffer resource
  1663. 60 Sample samples a texture
  1664. 61 SampleBias samples a texture after applying the input bias to the mipmap level
  1665. 62 SampleLevel samples a texture using a mipmap-level offset
  1666. 63 SampleGrad samples a texture using a gradient to influence the way the sample location is calculated
  1667. 64 SampleCmp samples a texture and compares a single component against the specified comparison value
  1668. 65 SampleCmpLevelZero samples a texture and compares a single component against the specified comparison value
  1669. 66 TextureLoad reads texel data without any filtering or sampling
  1670. 67 TextureStore reads texel data without any filtering or sampling
  1671. 68 BufferLoad reads from a TypedBuffer
  1672. 69 BufferStore writes to a RWTypedBuffer
  1673. 70 BufferUpdateCounter atomically increments/decrements the hidden 32-bit counter stored with a Count or Append UAV
  1674. 71 CheckAccessFullyMapped determines whether all values from a Sample, Gather, or Load operation accessed mapped tiles in a tiled resource
  1675. 72 GetDimensions gets texture size information
  1676. 73 TextureGather gathers the four texels that would be used in a bi-linear filtering operation
  1677. 74 TextureGatherCmp same as TextureGather, except this instrution performs comparison on texels, similar to SampleCmp
  1678. 75 Texture2DMSGetSamplePosition gets the position of the specified sample
  1679. 76 RenderTargetGetSamplePosition gets the position of the specified sample
  1680. 77 RenderTargetGetSampleCount gets the number of samples for a render target
  1681. 78 AtomicBinOp performs an atomic operation on two operands
  1682. 79 AtomicCompareExchange atomic compare and exchange to memory
  1683. 80 Barrier inserts a memory barrier in the shader
  1684. 81 CalculateLOD calculates the level of detail
  1685. 82 Discard discard the current pixel
  1686. 83 DerivCoarseX_ computes the rate of change per stamp in x direction.
  1687. 84 DerivCoarseY_ computes the rate of change per stamp in y direction.
  1688. 85 DerivFineX_ computes the rate of change per pixel in x direction.
  1689. 86 DerivFineY_ computes the rate of change per pixel in y direction.
  1690. 87 EvalSnapped evaluates an input attribute at pixel center with an offset
  1691. 88 EvalSampleIndex evaluates an input attribute at a sample location
  1692. 89 EvalCentroid evaluates an input attribute at pixel center
  1693. 90 SampleIndex returns the sample index in a sample-frequency pixel shader
  1694. 91 Coverage returns the coverage mask input in a pixel shader
  1695. 92 InnerCoverage returns underestimated coverage input from conservative rasterization in a pixel shader
  1696. 93 ThreadId reads the thread ID
  1697. 94 GroupId reads the group ID (SV_GroupID)
  1698. 95 ThreadIdInGroup reads the thread ID within the group (SV_GroupThreadID)
  1699. 96 FlattenedThreadIdInGroup provides a flattened index for a given thread within a given group (SV_GroupIndex)
  1700. 97 EmitStream emits a vertex to a given stream
  1701. 98 CutStream completes the current primitive topology at the specified stream
  1702. 99 EmitThenCutStream equivalent to an EmitStream followed by a CutStream
  1703. 100 GSInstanceID GSInstanceID
  1704. 101 MakeDouble creates a double value
  1705. 102 SplitDouble splits a double into low and high parts
  1706. 103 LoadOutputControlPoint LoadOutputControlPoint
  1707. 104 LoadPatchConstant LoadPatchConstant
  1708. 105 DomainLocation DomainLocation
  1709. 106 StorePatchConstant StorePatchConstant
  1710. 107 OutputControlPointID OutputControlPointID
  1711. 108 PrimitiveID PrimitiveID
  1712. 109 CycleCounterLegacy CycleCounterLegacy
  1713. 110 WaveIsFirstLane returns 1 for the first lane in the wave
  1714. 111 WaveGetLaneIndex returns the index of the current lane in the wave
  1715. 112 WaveGetLaneCount returns the number of lanes in the wave
  1716. 113 WaveAnyTrue returns 1 if any of the lane evaluates the value to true
  1717. 114 WaveAllTrue returns 1 if all the lanes evaluate the value to true
  1718. 115 WaveActiveAllEqual returns 1 if all the lanes have the same value
  1719. 116 WaveActiveBallot returns a struct with a bit set for each lane where the condition is true
  1720. 117 WaveReadLaneAt returns the value from the specified lane
  1721. 118 WaveReadLaneFirst returns the value from the first lane
  1722. 119 WaveActiveOp returns the result the operation across waves
  1723. 120 WaveActiveBit returns the result of the operation across all lanes
  1724. 121 WavePrefixOp returns the result of the operation on prior lanes
  1725. 122 QuadReadLaneAt reads from a lane in the quad
  1726. 123 QuadOp returns the result of a quad-level operation
  1727. 124 BitcastI16toF16 bitcast between different sizes
  1728. 125 BitcastF16toI16 bitcast between different sizes
  1729. 126 BitcastI32toF32 bitcast between different sizes
  1730. 127 BitcastF32toI32 bitcast between different sizes
  1731. 128 BitcastI64toF64 bitcast between different sizes
  1732. 129 BitcastF64toI64 bitcast between different sizes
  1733. 130 LegacyF32ToF16 legacy fuction to convert float (f32) to half (f16) (this is not related to min-precision)
  1734. 131 LegacyF16ToF32 legacy fuction to convert half (f16) to float (f32) (this is not related to min-precision)
  1735. 132 LegacyDoubleToFloat legacy fuction to convert double to float
  1736. 133 LegacyDoubleToSInt32 legacy fuction to convert double to int32
  1737. 134 LegacyDoubleToUInt32 legacy fuction to convert double to uint32
  1738. 135 WaveAllBitCount returns the count of bits set to 1 across the wave
  1739. 136 WavePrefixBitCount returns the count of bits set to 1 on prior lanes
  1740. 137 AttributeAtVertex_ returns the values of the attributes at the vertex.
  1741. 138 ViewID returns the view index
  1742. 139 RawBufferLoad reads from a raw buffer and structured buffer
  1743. 140 RawBufferStore writes to a RWByteAddressBuffer or RWStructuredBuffer
  1744. 141 InstanceID The user-provided InstanceID on the bottom-level acceleration structure instance within the top-level structure
  1745. 142 InstanceIndex The autogenerated index of the current instance in the top-level structure
  1746. 143 HitKind Returns the value passed as HitKind in ReportIntersection(). If intersection was reported by fixed-function triangle intersection, HitKind will be one of HIT_KIND_TRIANGLE_FRONT_FACE or HIT_KIND_TRIANGLE_BACK_FACE.
  1747. 144 RayFlags uint containing the current ray flags.
  1748. 145 DispatchRaysIndex The current x and y location within the Width and Height
  1749. 146 DispatchRaysDimensions The Width and Height values from the D3D12_DISPATCH_RAYS_DESC structure provided to the originating DispatchRays() call.
  1750. 147 WorldRayOrigin The world-space origin for the current ray.
  1751. 148 WorldRayDirection The world-space direction for the current ray.
  1752. 149 ObjectRayOrigin Object-space origin for the current ray.
  1753. 150 ObjectRayDirection Object-space direction for the current ray.
  1754. 151 ObjectToWorld Matrix for transforming from object-space to world-space.
  1755. 152 WorldToObject Matrix for transforming from world-space to object-space.
  1756. 153 RayTMin float representing the parametric starting point for the ray.
  1757. 154 RayTCurrent float representing the current parametric ending point for the ray
  1758. 155 IgnoreHit Used in an any hit shader to reject an intersection and terminate the shader
  1759. 156 AcceptHitAndEndSearch Used in an any hit shader to abort the ray query and the intersection shader (if any). The current hit is committed and execution passes to the closest hit shader with the closest hit recorded so far
  1760. 157 TraceRay initiates raytrace
  1761. 158 ReportHit returns true if hit was accepted
  1762. 159 CallShader Call a shader in the callable shader table supplied through the DispatchRays() API
  1763. 160 CreateHandleForLib create resource handle from resource struct for library
  1764. 161 PrimitiveIndex PrimitiveIndex for raytracing shaders
  1765. 162 Dot2AddHalf 2D half dot product with accumulate to float
  1766. 163 Dot4AddI8Packed signed dot product of 4 x i8 vectors packed into i32, with accumulate to i32
  1767. 164 Dot4AddU8Packed unsigned dot product of 4 x u8 vectors packed into i32, with accumulate to i32
  1768. 165 WaveMatch returns the bitmask of active lanes that have the same value
  1769. 166 WaveMultiPrefixOp returns the result of the operation on groups of lanes identified by a bitmask
  1770. 167 WaveMultiPrefixBitCount returns the count of bits set to 1 on groups of lanes identified by a bitmask
  1771. 168 SetMeshOutputCounts Mesh shader intrinsic SetMeshOutputCounts
  1772. 169 EmitIndices emit a primitive's vertex indices in a mesh shader
  1773. 170 GetMeshPayload get the mesh payload which is from amplification shader
  1774. 171 StoreVertexOutput stores the value to mesh shader vertex output
  1775. 172 StorePrimitiveOutput stores the value to mesh shader primitive output
  1776. 173 DispatchMesh Amplification shader intrinsic DispatchMesh
  1777. 174 WriteSamplerFeedback updates a feedback texture for a sampling operation
  1778. 175 WriteSamplerFeedbackBias updates a feedback texture for a sampling operation with a bias on the mipmap level
  1779. 176 WriteSamplerFeedbackLevel updates a feedback texture for a sampling operation with a mipmap-level offset
  1780. 177 WriteSamplerFeedbackGrad updates a feedback texture for a sampling operation with explicit gradients
  1781. 178 AllocateRayQuery allocates space for RayQuery and return handle
  1782. 179 RayQuery_TraceRayInline initializes RayQuery for raytrace
  1783. 180 RayQuery_Proceed advances a ray query
  1784. 181 RayQuery_Abort aborts a ray query
  1785. 182 RayQuery_CommitNonOpaqueTriangleHit commits a non opaque triangle hit
  1786. 183 RayQuery_CommitProceduralPrimitiveHit commits a procedural primitive hit
  1787. 184 RayQuery_CommittedStatus returns uint status (COMMITTED_STATUS) of the committed hit in a ray query
  1788. 185 RayQuery_CandidateType returns uint candidate type (CANDIDATE_TYPE) of the current hit candidate in a ray query, after Proceed() has returned true
  1789. 186 RayQuery_CandidateObjectToWorld3x4 returns matrix for transforming from object-space to world-space for a candidate hit.
  1790. 187 RayQuery_CandidateWorldToObject3x4 returns matrix for transforming from world-space to object-space for a candidate hit.
  1791. 188 RayQuery_CommittedObjectToWorld3x4 returns matrix for transforming from object-space to world-space for a Committed hit.
  1792. 189 RayQuery_CommittedWorldToObject3x4 returns matrix for transforming from world-space to object-space for a Committed hit.
  1793. 190 RayQuery_CandidateProceduralPrimitiveNonOpaque returns if current candidate procedural primitive is non opaque
  1794. 191 RayQuery_CandidateTriangleFrontFace returns if current candidate triangle is front facing
  1795. 192 RayQuery_CommittedTriangleFrontFace returns if current committed triangle is front facing
  1796. 193 RayQuery_CandidateTriangleBarycentrics returns candidate triangle hit barycentrics
  1797. 194 RayQuery_CommittedTriangleBarycentrics returns committed triangle hit barycentrics
  1798. 195 RayQuery_RayFlags returns ray flags
  1799. 196 RayQuery_WorldRayOrigin returns world ray origin
  1800. 197 RayQuery_WorldRayDirection returns world ray direction
  1801. 198 RayQuery_RayTMin returns float representing the parametric starting point for the ray.
  1802. 199 RayQuery_CandidateTriangleRayT returns float representing the parametric point on the ray for the current candidate triangle hit.
  1803. 200 RayQuery_CommittedRayT returns float representing the parametric point on the ray for the current committed hit.
  1804. 201 RayQuery_CandidateInstanceIndex returns candidate hit instance index
  1805. 202 RayQuery_CandidateInstanceID returns candidate hit instance ID
  1806. 203 RayQuery_CandidateGeometryIndex returns candidate hit geometry index
  1807. 204 RayQuery_CandidatePrimitiveIndex returns candidate hit geometry index
  1808. 205 RayQuery_CandidateObjectRayOrigin returns candidate hit object ray origin
  1809. 206 RayQuery_CandidateObjectRayDirection returns candidate object ray direction
  1810. 207 RayQuery_CommittedInstanceIndex returns committed hit instance index
  1811. 208 RayQuery_CommittedInstanceID returns committed hit instance ID
  1812. 209 RayQuery_CommittedGeometryIndex returns committed hit geometry index
  1813. 210 RayQuery_CommittedPrimitiveIndex returns committed hit geometry index
  1814. 211 RayQuery_CommittedObjectRayOrigin returns committed hit object ray origin
  1815. 212 RayQuery_CommittedObjectRayDirection returns committed object ray direction
  1816. 213 GeometryIndex The autogenerated index of the current geometry in the bottom-level structure
  1817. 214 RayQuery_CandidateInstanceContributionToHitGroupIndex returns candidate hit InstanceContributionToHitGroupIndex
  1818. 215 RayQuery_CommittedInstanceContributionToHitGroupIndex returns committed hit InstanceContributionToHitGroupIndex
  1819. 216 AnnotateHandle annotate handle with resource properties
  1820. 217 CreateHandleFromBinding create resource handle from binding
  1821. 218 CreateHandleFromHeap create resource handle from heap
  1822. 219 Unpack4x8 unpacks 4 8-bit signed or unsigned values into int32 or int16 vector
  1823. 220 Pack4x8 packs vector of 4 signed or unsigned values into a packed datatype, drops or clamps unused bits
  1824. 221 IsHelperLane returns true on helper lanes in pixel shaders
  1825. 222 TextureGatherImm same as TextureGather, except offsets are limited to immediate values between -8 and 7
  1826. 223 TextureGatherCmpImm same as TextureGatherCmp, except offsets are limited to immediate values between -8 and 7
  1827. === ===================================================== =======================================================================================================================================================================================================================
  1828. Acos
  1829. ~~~~
  1830. The return value is within the range of -PI/2 to PI/2.
  1831. +----------+------+--------------+---------+------+------+---------+------+-----+
  1832. | src | -inf | [-1,1] | -denorm | -0 | +0 | +denorm | +inf | NaN |
  1833. +----------+------+--------------+---------+------+------+---------+------+-----+
  1834. | acos(src)| NaN | (-PI/2,+PI/2)| PI/2 | PI/2 | PI/2 | PI/2 | NaN | NaN |
  1835. +----------+------+--------------+---------+------+------+---------+------+-----+
  1836. Asin
  1837. ~~~~
  1838. The return value is within the range of -PI/2 to PI/2.
  1839. +----------+------+--------------+---------+------+------+---------+------+-----+
  1840. | src | -inf | [-1,1] | -denorm | -0 | +0 | +denorm | +inf | NaN |
  1841. +----------+------+--------------+---------+------+------+---------+------+-----+
  1842. | asin(src)| NaN | (-PI/2,+PI/2)| 0 | 0 | 0 | 0 | NaN | NaN |
  1843. +----------+------+--------------+---------+------+------+---------+------+-----+
  1844. Atan
  1845. ~~~~
  1846. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  1847. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F |+inf | NaN |
  1848. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  1849. | atan(src)| -PI/2| (-PI/2,+PI/2)| 0 | 0 | 0 | 0 | (-PI/2,+PI/2) |PI/2 | NaN |
  1850. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  1851. Returns the arctangent of the specified value. The return value is within the range of -PI/2 to PI/2
  1852. AttributeAtVertex
  1853. ~~~~~~~~~~~~~~~~~
  1854. returns the values of the attributes at the vertex. VertexID ranges from 0 to 2.
  1855. Bfi
  1856. ~~~
  1857. Given a bit range from the LSB of a number, place that number of bits in another number at any offset.
  1858. dst = Bfi(src0, src1, src2, src3);
  1859. The LSB 5 bits of src0 provide the bitfield width (0-31) to take from src2.
  1860. The LSB 5 bits of src1 provide the bitfield offset (0-31) to start replacing bits in the number read from src3.
  1861. Given width, offset: bitmask = (((1 << width)-1) << offset) & 0xffffffff, dest = ((src2 << offset) & bitmask) | (src3 & ~bitmask)
  1862. Bfrev
  1863. ~~~~~
  1864. Reverses the order of the bits. For example given 0x12345678 the result would be 0x1e6a2c48.
  1865. Cos
  1866. ~~~
  1867. Theta values can be any IEEE 32-bit floating point values.
  1868. The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
  1869. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1870. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  1871. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1872. | cos(src) | NaN | [-1 to +1] | +1 | +1 | +1 | +1 | [-1 to +1] | NaN | NaN |
  1873. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1874. Countbits
  1875. ~~~~~~~~~
  1876. Counts the number of bits in the input integer.
  1877. DerivCoarseX
  1878. ~~~~~~~~~~~~
  1879. dst = DerivCoarseX(src);
  1880. Computes the rate of change per stamp in x direction. Only a single x derivative pair is computed for each 2x2 stamp of pixels.
  1881. The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
  1882. As an example, the x derivative could be a delta from the top row of pixels.
  1883. The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  1884. DerivCoarseY
  1885. ~~~~~~~~~~~~
  1886. dst = DerivCoarseY(src);
  1887. Computes the rate of change per stamp in y direction. Only a single y derivative pair is computed for each 2x2 stamp of pixels.
  1888. The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
  1889. As an example, the y derivative could be a delta from the left column of pixels.
  1890. The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  1891. DerivFineX
  1892. ~~~~~~~~~~
  1893. dst = DerivFineX(src);
  1894. Computes the rate of change per pixel in x direction. Each pixel in the 2x2 stamp gets a unique pair of x derivative calculations
  1895. The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
  1896. There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  1897. DerivFineY
  1898. ~~~~~~~~~~
  1899. dst = DerivFineY(src);
  1900. Computes the rate of change per pixel in y direction. Each pixel in the 2x2 stamp gets a unique pair of y derivative calculations
  1901. The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
  1902. There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  1903. Dot2
  1904. ~~~~
  1905. Two-dimensional vector dot-product
  1906. Dot3
  1907. ~~~~
  1908. Three-dimensional vector dot-product
  1909. Dot4
  1910. ~~~~
  1911. Four-dimensional vector dot-product
  1912. Exp
  1913. ~~~
  1914. Returns 2^exponent. Note that hlsl log intrinsic returns the base-e exponent. Maximum relative error is e^-21.
  1915. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1916. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  1917. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1918. | exp(src) | 0 | +F | 1 | 1 | 1 | 1 | +F | +inf | NaN |
  1919. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1920. FAbs
  1921. ~~~~
  1922. The FAbs instruction takes simply forces the sign of the number(s) on the source operand positive, including on INF and denorm values.
  1923. Applying FAbs on NaN preserves NaN, although the particular NaN bit pattern that results is not defined.
  1924. FMad
  1925. ~~~~
  1926. Floating point multiply & add. This operation is not fused for "precise" operations.
  1927. FMad(a,b,c) = a * b + c
  1928. FMax
  1929. ~~~~
  1930. >= is used instead of > so that if min(x,y) = x then max(x,y) = y.
  1931. NaN has special handling: If one source operand is NaN, then the other source operand is returned.
  1932. If both are NaN, any NaN representation is returned.
  1933. This conforms to new IEEE 754R rules.
  1934. Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
  1935. +------+-----------------------------+
  1936. | a | b |
  1937. | +------+--------+------+------+
  1938. | | -inf | F | +inf | NaN |
  1939. +------+------+--------+------+------+
  1940. | -inf | -inf | b | +inf | -inf |
  1941. +------+------+--------+------+------+
  1942. | F | a | a or b | +inf | a |
  1943. +------+------+--------+------+------+
  1944. | +inf | +inf | +inf | +inf | +inf |
  1945. +------+------+--------+------+------+
  1946. | NaN | -inf | b | +inf | NaN |
  1947. +------+------+--------+------+------+
  1948. FMin
  1949. ~~~~
  1950. NaN has special handling: If one source operand is NaN, then the other source operand is returned.
  1951. If both are NaN, any NaN representation is returned.
  1952. This conforms to new IEEE 754R rules.
  1953. Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
  1954. +------+-----------------------------+
  1955. | a | b |
  1956. | +------+--------+------+------+
  1957. | | -inf | F | +inf | NaN |
  1958. +------+------+--------+------+------+
  1959. | -inf | -inf | -inf | -inf | -inf |
  1960. +------+------+--------+------+------+
  1961. | F | -inf | a or b | a | a |
  1962. +------+------+--------+------+------+
  1963. | +inf | -inf | b | +inf | +inf |
  1964. +------+------+--------+------+------+
  1965. | NaN | -inf | b | +inf | NaN |
  1966. +------+------+--------+------+------+
  1967. FirstbitHi
  1968. ~~~~~~~~~~
  1969. Returns the integer position of the first bit set in the 32-bit input starting from the MSB. For example, 0x10000000 would return 3. Returns 0xffffffff if no match was found.
  1970. FirstbitLo
  1971. ~~~~~~~~~~
  1972. Returns the integer position of the first bit set in the 32-bit input starting from the LSB. For example, 0x00000000 would return 1. Returns 0xffffffff if no match was found.
  1973. FirstbitSHi
  1974. ~~~~~~~~~~~
  1975. Returns the first 0 from the MSB if the number is negative, else the first 1 from the MSB. Returns 0xffffffff if no match was found.
  1976. Fma
  1977. ~~~
  1978. Fused multiply-add. This operation is only defined in double precision.
  1979. Fma(a,b,c) = a * b + c
  1980. Frc
  1981. ~~~
  1982. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  1983. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  1984. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  1985. | log(src) | NaN |[+0,1)| +0 | +0 | +0 | +0 | [+0,1) | NaN | NaN |
  1986. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  1987. Hcos
  1988. ~~~~
  1989. Returns the hyperbolic cosine of the specified value.
  1990. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1991. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  1992. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1993. | hcos(src)| +inf | (1, +inf) | +1 | +1 | +1 | +1 | (1, +inf) | +inf | NaN |
  1994. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1995. Hsin
  1996. ~~~~
  1997. Returns the hyperbolic sine of the specified value.
  1998. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  1999. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2000. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2001. | hsin(src)| -inf | -F | 0 | 0 | 0 | 0 | +F | +inf | NaN |
  2002. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2003. Htan
  2004. ~~~~
  2005. Returns the hyperbolic tangent of the specified value.
  2006. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2007. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2008. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2009. | htan(src)| -1 | -F | 0 | 0 | 0 | 0 | +F | +1 | NaN |
  2010. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2011. IMad
  2012. ~~~~
  2013. Signed integer multiply & add
  2014. IMad(a,b,c) = a * b + c
  2015. IMax
  2016. ~~~~
  2017. IMax(a,b) returns a if a > b, else b. Optional negate modifier on source operands takes 2's complement before performing operation.
  2018. IMin
  2019. ~~~~
  2020. IMin(a,b) returns a if a < b, else b. Optional negate modifier on source operands takes 2's complement before performing operation.
  2021. IMul
  2022. ~~~~
  2023. IMul(src0, src1) = destHi, destLo
  2024. multiply of 32-bit operands src0 and src1 (note they are signed), producing the correct full 64-bit result.
  2025. The low 32 bits are placed in destLO. The high 32 bits are placed in destHI.
  2026. Either of destHI or destLO may be specified as NULL instead of specifying a register, in the case high or low 32 bits of the 64-bit result are not needed.
  2027. Optional negate modifier on source operands takes 2's complement before performing arithmetic operation.
  2028. Ibfe
  2029. ~~~~
  2030. dest = Ibfe(src0, src1, src2)
  2031. Given a range of bits in a number, shift those bits to the LSB and sign extend the MSB of the range.
  2032. width : The LSB 5 bits of src0 (0-31).
  2033. offset: The LSB 5 bits of src1 (0-31)
  2034. .. code:: c
  2035. if( width == 0 )
  2036. {
  2037. dest = 0
  2038. }
  2039. else if( width + offset < 32 )
  2040. {
  2041. shl dest, src2, 32-(width+offset)
  2042. ishr dest, dest, 32-width
  2043. }
  2044. else
  2045. {
  2046. ishr dest, src2, offset
  2047. }
  2048. IsFinite
  2049. ~~~~~~~~
  2050. Returns true if x is finite, false otherwise.
  2051. IsInf
  2052. ~~~~~
  2053. Returns true if x is +INF or -INF, false otherwise.
  2054. IsNaN
  2055. ~~~~~
  2056. Returns true if x is NAN or QNAN, false otherwise.
  2057. IsNormal
  2058. ~~~~~~~~
  2059. Returns IsNormal.
  2060. LoadInput
  2061. ~~~~~~~~~
  2062. Loads the value from shader input
  2063. Log
  2064. ~~~
  2065. Returns log base 2. Note that hlsl log intrinsic returns natural log.
  2066. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2067. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2068. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2069. | log(src) | NaN | NaN | -inf |-inf|-inf| -inf | F | +inf | NaN |
  2070. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2071. MinPrecXRegLoad
  2072. ~~~~~~~~~~~~~~~
  2073. Helper load operation for minprecision
  2074. MinPrecXRegStore
  2075. ~~~~~~~~~~~~~~~~
  2076. Helper store operation for minprecision
  2077. Msad
  2078. ~~~~
  2079. Returns the masked Sum of Absolute Differences.
  2080. dest = msad(ref, src, accum)
  2081. ref: contains 4 packed 8-bit unsigned integers in 32 bits.
  2082. src: contains 4 packed 8-bit unsigned integers in 32 bits.
  2083. accum: a 32-bit unsigned integer, providing an existing accumulation.
  2084. dest receives the result of the masked SAD operation added to the accumulation value.
  2085. .. code:: c
  2086. UINT msad( UINT ref, UINT src, UINT accum )
  2087. {
  2088. for (UINT i = 0; i < 4; i++)
  2089. {
  2090. BYTE refByte, srcByte, absDiff;
  2091. refByte = (BYTE)(ref >> (i * 8));
  2092. if (!refByte)
  2093. {
  2094. continue;
  2095. }
  2096. srcByte = (BYTE)(src >> (i * 8));
  2097. if (refByte >= srcByte)
  2098. {
  2099. absDiff = refByte - srcByte;
  2100. }
  2101. else
  2102. {
  2103. absDiff = srcByte - refByte;
  2104. }
  2105. // The recommended overflow behavior for MSAD is
  2106. // to do a 32-bit saturate. This is not
  2107. // required, however, and wrapping is allowed.
  2108. // So from an application point of view,
  2109. // overflow behavior is undefined.
  2110. if (UINT_MAX - accum < absDiff)
  2111. {
  2112. accum = UINT_MAX;
  2113. break;
  2114. }
  2115. accum += absDiff;
  2116. }
  2117. return accum;
  2118. }
  2119. Round_ne
  2120. ~~~~~~~~
  2121. Floating-point round of the values in src,
  2122. writing integral floating-point values to dest.
  2123. round_ne rounds towards nearest even. For halfway, it rounds away from zero.
  2124. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2125. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2126. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2127. | round_ne(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2128. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2129. Round_ni
  2130. ~~~~~~~~
  2131. Floating-point round of the values in src,
  2132. writing integral floating-point values to dest.
  2133. round_ni rounds towards -INF, commonly known as floor().
  2134. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2135. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2136. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2137. | round_ni(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2138. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2139. Round_pi
  2140. ~~~~~~~~
  2141. Floating-point round of the values in src,
  2142. writing integral floating-point values to dest.
  2143. round_pi rounds towards +INF, commonly known as ceil().
  2144. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2145. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2146. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2147. | round_pi(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2148. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2149. Round_z
  2150. ~~~~~~~
  2151. Floating-point round of the values in src,
  2152. writing integral floating-point values to dest.
  2153. round_z rounds towards zero.
  2154. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2155. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2156. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2157. | round_z(src) | -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2158. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2159. Rsqrt
  2160. ~~~~~
  2161. Maximum relative error is 2^21.
  2162. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2163. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2164. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2165. | rsqrt(src) | -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2166. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2167. Saturate
  2168. ~~~~~~~~
  2169. The Saturate instruction performs the following operation on its input value:
  2170. min(1.0f, max(0.0f, value))
  2171. where min() and max() in the above expression behave in the way Min and Max behave.
  2172. Saturate(NaN) returns 0, by the rules for min and max.
  2173. Sin
  2174. ~~~
  2175. Theta values can be any IEEE 32-bit floating point values.
  2176. The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
  2177. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2178. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2179. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2180. | sin(src) | NaN | [-1 to +1] | -0 | -0 | +0 | +0 | [-1 to +1] | NaN | NaN |
  2181. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  2182. Sqrt
  2183. ~~~~
  2184. Precision is 1 ulp.
  2185. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2186. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2187. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2188. | sqrt(src) | NaN | NaN| -0 | -0 | +0 | +0 | +F | +inf | NaN |
  2189. +--------------+------+----+---------+----+----+---------+----+------+-----+
  2190. StoreOutput
  2191. ~~~~~~~~~~~
  2192. Stores the value to shader output
  2193. Tan
  2194. ~~~
  2195. Theta values can be any IEEE 32-bit floating point values.
  2196. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  2197. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  2198. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  2199. | tan(src) | NaN | [-inf to +inf] | -0 | -0 | +0 | +0 | [-inf to +inf] | NaN | NaN |
  2200. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  2201. TempRegLoad
  2202. ~~~~~~~~~~~
  2203. Helper load operation
  2204. TempRegStore
  2205. ~~~~~~~~~~~~
  2206. Helper store operation
  2207. UAddc
  2208. ~~~~~
  2209. dest0, dest1 = UAddc(src0, src1)
  2210. unsigned add of 32-bit operands src0 and src1, placing the LSB part of the 32-bit result in dest0.
  2211. dest1 is written with: 1 if a carry is produced, 0 otherwise. Dest1 can be NULL if the carry is not needed
  2212. UDiv
  2213. ~~~~
  2214. destQUOT, destREM = UDiv(src0, src1);
  2215. unsigned divide of the 32-bit operand src0 by the 32-bit operand src1.
  2216. The results of the divides are the 32-bit quotients (placed in destQUOT) and 32-bit remainders (placed in destREM).
  2217. Divide by zero returns 0xffffffff for both quotient and remainder.
  2218. Either destQUOT or destREM may be specified as NULL instead of specifying a register, in the case the quotient or remainder are not needed.
  2219. Unsigned subtract of 32-bit operands src1 from src0, placing the LSB part of the 32-bit result in dest0.
  2220. dest1 is written with: 1 if a borrow is produced, 0 otherwise. Dest1 can be NULL if the borrow is not needed
  2221. UMad
  2222. ~~~~
  2223. Unsigned integer multiply & add.
  2224. Umad(a,b,c) = a * b + c
  2225. UMax
  2226. ~~~~
  2227. unsigned integer maximum. UMax(a,b) = a > b ? a : b
  2228. UMin
  2229. ~~~~
  2230. unsigned integer minimum. UMin(a,b) = a < b ? a : b
  2231. UMul
  2232. ~~~~
  2233. multiply of 32-bit operands src0 and src1 (note they are unsigned), producing the correct full 64-bit result.
  2234. The low 32 bits are placed in destLO. The high 32 bits are placed in destHI.
  2235. Either of destHI or destLO may be specified as NULL instead of specifying a register, in the case high or low 32 bits of the 64-bit result are not needed
  2236. USubb
  2237. ~~~~~
  2238. dest0, dest1 = USubb(src0, src1)
  2239. Ubfe
  2240. ~~~~
  2241. dest = ubfe(src0, src1, src2)
  2242. Given a range of bits in a number, shift those bits to the LSB and set remaining bits to 0.
  2243. width : The LSB 5 bits of src0 (0-31).
  2244. offset: The LSB 5 bits of src1 (0-31).
  2245. Given width, offset:
  2246. .. code:: c
  2247. if( width == 0 )
  2248. {
  2249. dest = 0
  2250. }
  2251. else if( width + offset < 32 )
  2252. {
  2253. shl dest, src2, 32-(width+offset)
  2254. ushr dest, dest, 32-width
  2255. }
  2256. else
  2257. {
  2258. ushr dest, src2, offset
  2259. }
  2260. .. OPCODES-RST:END
  2261. Custom instructions
  2262. -------------------
  2263. Instructions for third-party extensions will be specially-prefixed external function calls, identified by a declared extension-set-prefix. Additional metadata will be included to provide hints about uniformity, pure or const guarantees, alignment, etc.
  2264. Validation Rules
  2265. ================
  2266. The following rules are verified by the *Validator* component and thus can be relied upon by downstream consumers.
  2267. The set of validation rules that are known to hold for a DXIL program is identifier by the 'dx.valver' named metadata node, which consists of a two-element tuple of constant int values, a major and minor version. Minor version numbers are increments as rules are added to a prior table or as the implementation fixes issues.
  2268. .. <py::lines('VALRULES-RST')>hctdb_instrhelp.get_valrules_rst()</py>
  2269. .. VALRULES-RST:BEGIN
  2270. ========================================= ========================================================================================================================================================================================================================================================================================================
  2271. Rule Code Description
  2272. ========================================= ========================================================================================================================================================================================================================================================================================================
  2273. BITCODE.VALID Module must be bitcode-valid
  2274. CONTAINER.PARTINVALID DXIL Container must not contain unknown parts
  2275. CONTAINER.PARTMATCHES DXIL Container Parts must match Module
  2276. CONTAINER.PARTMISSING DXIL Container requires certain parts, corresponding to module
  2277. CONTAINER.PARTREPEATED DXIL Container must have only one of each part type
  2278. CONTAINER.ROOTSIGNATUREINCOMPATIBLE Root Signature in DXIL Container must be compatible with shader
  2279. DECL.ATTRSTRUCT Attributes parameter must be struct type
  2280. DECL.DXILFNEXTERN External function must be a DXIL function
  2281. DECL.DXILNSRESERVED The DXIL reserved prefixes must only be used by built-in functions and types
  2282. DECL.EXTRAARGS Extra arguments not allowed for shader functions
  2283. DECL.FNATTRIBUTE Functions should only contain known function attributes
  2284. DECL.FNFLATTENPARAM Function parameters must not use struct types
  2285. DECL.FNISCALLED Functions can only be used by call instructions
  2286. DECL.NOTUSEDEXTERNAL External declaration should not be used
  2287. DECL.PARAMSTRUCT Callable function parameter must be struct type
  2288. DECL.PAYLOADSTRUCT Payload parameter must be struct type
  2289. DECL.RESOURCEINFNSIG Resources not allowed in function signatures
  2290. DECL.SHADERMISSINGARG payload/params/attributes parameter is required for certain shader types
  2291. DECL.SHADERRETURNVOID Shader functions must return void
  2292. DECL.USEDEXTERNALFUNCTION External function must be used
  2293. DECL.USEDINTERNAL Internal declaration must be used
  2294. FLOW.DEADLOOP Loop must have break.
  2295. FLOW.FUNCTIONCALL Function with parameter is not permitted
  2296. FLOW.NORECUSION Recursion is not permitted.
  2297. FLOW.REDUCIBLE Execution flow must be reducible.
  2298. INSTR.ALLOWED Instructions must be of an allowed type.
  2299. INSTR.ATTRIBUTEATVERTEXNOINTERPOLATION Attribute %0 must have nointerpolation mode in order to use GetAttributeAtVertex function.
  2300. INSTR.BARRIERMODEFORNONCS sync in a non-Compute/Amplification/Mesh Shader must only sync UAV (sync_uglobal).
  2301. INSTR.BARRIERMODENOMEMORY sync must include some form of memory barrier - _u (UAV) and/or _g (Thread Group Shared Memory). Only _t (thread group sync) is optional.
  2302. INSTR.BARRIERMODEUSELESSUGROUP sync can't specify both _ugroup and _uglobal. If both are needed, just specify _uglobal.
  2303. INSTR.BUFFERUPDATECOUNTERONRESHASCOUNTER BufferUpdateCounter valid only when HasCounter is true.
  2304. INSTR.BUFFERUPDATECOUNTERONUAV BufferUpdateCounter valid only on UAV.
  2305. INSTR.CALLOLOAD Call to DXIL intrinsic must match overload signature
  2306. INSTR.CANNOTPULLPOSITION pull-model evaluation of position disallowed
  2307. INSTR.CBUFFERCLASSFORCBUFFERHANDLE Expect Cbuffer for CBufferLoad handle.
  2308. INSTR.CBUFFEROUTOFBOUND Cbuffer access out of bound.
  2309. INSTR.CHECKACCESSFULLYMAPPED CheckAccessFullyMapped should only be used on resource status.
  2310. INSTR.COORDINATECOUNTFORRAWTYPEDBUF raw/typed buffer don't need 2 coordinates.
  2311. INSTR.COORDINATECOUNTFORSTRUCTBUF structured buffer require 2 coordinates.
  2312. INSTR.CREATEHANDLEIMMRANGEID Local resource must map to global resource.
  2313. INSTR.DXILSTRUCTUSER Dxil struct types should only be used by ExtractValue.
  2314. INSTR.DXILSTRUCTUSEROUTOFBOUND Index out of bound when extract value from dxil struct types.
  2315. INSTR.EVALINTERPOLATIONMODE Interpolation mode on %0 used with eval_* instruction must be linear, linear_centroid, linear_noperspective, linear_noperspective_centroid, linear_sample or linear_noperspective_sample.
  2316. INSTR.EXTRACTVALUE ExtractValue should only be used on dxil struct types and cmpxchg.
  2317. INSTR.FAILTORESLOVETGSMPOINTER TGSM pointers must originate from an unambiguous TGSM global variable.
  2318. INSTR.HANDLENOTFROMCREATEHANDLE Resource handle should returned by createHandle.
  2319. INSTR.IMMBIASFORSAMPLEB bias amount for sample_b must be in the range [%0,%1], but %2 was specified as an immediate.
  2320. INSTR.INBOUNDSACCESS Access to out-of-bounds memory is disallowed.
  2321. INSTR.MINPRECISIONNOTPRECISE Instructions marked precise may not refer to minprecision values.
  2322. INSTR.MINPRECISONBITCAST Bitcast on minprecison types is not allowed.
  2323. INSTR.MIPLEVELFORGETDIMENSION Use mip level on buffer when GetDimensions.
  2324. INSTR.MIPONUAVLOAD uav load don't support mipLevel/sampleIndex.
  2325. INSTR.MISSINGSETMESHOUTPUTCOUNTS Missing SetMeshOutputCounts call.
  2326. INSTR.MULTIPLEGETMESHPAYLOAD GetMeshPayload cannot be called multiple times.
  2327. INSTR.MULTIPLESETMESHOUTPUTCOUNTS SetMeshOUtputCounts cannot be called multiple times.
  2328. INSTR.NOGENERICPTRADDRSPACECAST Address space cast between pointer types must have one part to be generic address space.
  2329. INSTR.NOIDIVBYZERO No signed integer division by zero.
  2330. INSTR.NOINDEFINITEACOS No indefinite arccosine.
  2331. INSTR.NOINDEFINITEASIN No indefinite arcsine.
  2332. INSTR.NOINDEFINITEDSXY No indefinite derivative calculation.
  2333. INSTR.NOINDEFINITELOG No indefinite logarithm.
  2334. INSTR.NONDOMINATINGDISPATCHMESH Non-Dominating DispatchMesh call.
  2335. INSTR.NONDOMINATINGSETMESHOUTPUTCOUNTS Non-Dominating SetMeshOutputCounts call.
  2336. INSTR.NOREADINGUNINITIALIZED Instructions should not read uninitialized value.
  2337. INSTR.NOTONCEDISPATCHMESH DispatchMesh must be called exactly once in an Amplification shader.
  2338. INSTR.NOUDIVBYZERO No unsigned integer division by zero.
  2339. INSTR.OFFSETONUAVLOAD uav load don't support offset.
  2340. INSTR.OLOAD DXIL intrinsic overload must be valid.
  2341. INSTR.ONLYONEALLOCCONSUME RWStructuredBuffers may increment or decrement their counters, but not both.
  2342. INSTR.OPCODERESERVED Instructions must not reference reserved opcodes.
  2343. INSTR.OPCONST DXIL intrinsic requires an immediate constant operand
  2344. INSTR.OPCONSTRANGE Constant values must be in-range for operation.
  2345. INSTR.OPERANDRANGE DXIL intrinsic operand must be within defined range
  2346. INSTR.PTRBITCAST Pointer type bitcast must be have same size.
  2347. INSTR.RESOURCECLASSFORLOAD load can only run on UAV/SRV resource.
  2348. INSTR.RESOURCECLASSFORSAMPLERGATHER sample, lod and gather should be on srv resource.
  2349. INSTR.RESOURCECLASSFORUAVSTORE store should be on uav resource.
  2350. INSTR.RESOURCECOORDINATEMISS coord uninitialized.
  2351. INSTR.RESOURCECOORDINATETOOMANY out of bound coord must be undef.
  2352. INSTR.RESOURCEKINDFORBUFFERLOADSTORE buffer load/store only works on Raw/Typed/StructuredBuffer.
  2353. INSTR.RESOURCEKINDFORCALCLOD lod requires resource declared as texture1D/2D/3D/Cube/CubeArray/1DArray/2DArray.
  2354. INSTR.RESOURCEKINDFORGATHER gather requires resource declared as texture/2D/Cube/2DArray/CubeArray.
  2355. INSTR.RESOURCEKINDFORGETDIM Invalid resource kind on GetDimensions.
  2356. INSTR.RESOURCEKINDFORSAMPLE sample/_l/_d requires resource declared as texture1D/2D/3D/Cube/1DArray/2DArray/CubeArray.
  2357. INSTR.RESOURCEKINDFORSAMPLEC samplec requires resource declared as texture1D/2D/Cube/1DArray/2DArray/CubeArray.
  2358. INSTR.RESOURCEKINDFORTEXTURELOAD texture load only works on Texture1D/1DArray/2D/2DArray/3D/MS2D/MS2DArray.
  2359. INSTR.RESOURCEKINDFORTEXTURESTORE texture store only works on Texture1D/1DArray/2D/2DArray/3D.
  2360. INSTR.RESOURCEKINDFORTRACERAY TraceRay should only use RTAccelerationStructure.
  2361. INSTR.RESOURCEMAPTOSINGLEENTRY Fail to map resource to resource table.
  2362. INSTR.RESOURCEOFFSETMISS offset uninitialized.
  2363. INSTR.RESOURCEOFFSETTOOMANY out of bound offset must be undef.
  2364. INSTR.RESOURCEUSER Resource should only be used by Load/GEP/Call.
  2365. INSTR.SAMPLECOMPTYPE sample_* instructions require resource to be declared to return UNORM, SNORM or FLOAT.
  2366. INSTR.SAMPLEINDEXFORLOAD2DMS load on Texture2DMS/2DMSArray require sampleIndex.
  2367. INSTR.SAMPLERMODEFORLOD lod instruction requires sampler declared in default mode.
  2368. INSTR.SAMPLERMODEFORSAMPLE sample/_l/_d/_cl_s/gather instruction requires sampler declared in default mode.
  2369. INSTR.SAMPLERMODEFORSAMPLEC sample_c_*/gather_c instructions require sampler declared in comparison mode.
  2370. INSTR.SIGNATUREOPERATIONNOTINENTRY Dxil operation for input output signature must be in entryPoints.
  2371. INSTR.STATUS Resource status should only be used by CheckAccessFullyMapped.
  2372. INSTR.STRUCTBITCAST Bitcast on struct types is not allowed.
  2373. INSTR.TEXTUREOFFSET offset texture instructions must take offset which can resolve to integer literal in the range -8 to 7.
  2374. INSTR.TGSMRACECOND Race condition writing to shared memory detected, consider making this write conditional.
  2375. INSTR.UNDEFINEDVALUEFORUAVSTORE Assignment of undefined values to UAV.
  2376. INSTR.UNDEFRESULTFORGETDIMENSION GetDimensions used undef dimension %0 on %1.
  2377. INSTR.WRITEMASKFORTYPEDUAVSTORE store on typed uav must write to all four components of the UAV.
  2378. INSTR.WRITEMASKMATCHVALUEFORUAVSTORE uav store write mask must match store value mask, write mask is %0 and store value mask is %1.
  2379. META.BARYCENTRICSFLOAT3 only 'float3' type is allowed for SV_Barycentrics.
  2380. META.BARYCENTRICSINTERPOLATION SV_Barycentrics cannot be used with 'nointerpolation' type.
  2381. META.BARYCENTRICSTWOPERSPECTIVES There can only be up to two input attributes of SV_Barycentrics with different perspective interpolation mode.
  2382. META.BRANCHFLATTEN Can't use branch and flatten attributes together.
  2383. META.CLIPCULLMAXCOMPONENTS Combined elements of SV_ClipDistance and SV_CullDistance must fit in 8 components
  2384. META.CLIPCULLMAXROWS Combined elements of SV_ClipDistance and SV_CullDistance must fit in two rows.
  2385. META.CONTROLFLOWHINTNOTONCONTROLFLOW Control flow hint only works on control flow inst.
  2386. META.DENSERESIDS Resource identifiers must be zero-based and dense.
  2387. META.DUPLICATESYSVALUE System value may only appear once in signature
  2388. META.ENTRYFUNCTION entrypoint not found.
  2389. META.FLAGSUSAGE Flags must match usage.
  2390. META.FORCECASEONSWITCH Attribute forcecase only works for switch.
  2391. META.GLCNOTONAPPENDCONSUME globallycoherent cannot be used with append/consume buffers: '%0'.
  2392. META.INTEGERINTERPMODE Interpolation mode on integer must be Constant
  2393. META.INTERPMODEINONEROW Interpolation mode must be identical for all elements packed into the same row.
  2394. META.INTERPMODEVALID Interpolation mode must be valid
  2395. META.INVALIDCONTROLFLOWHINT Invalid control flow hint.
  2396. META.KNOWN Named metadata should be known
  2397. META.MAXTESSFACTOR Hull Shader MaxTessFactor must be [%0..%1]. %2 specified.
  2398. META.NOENTRYPROPSFORENTRY Entry point %0 must have entry properties.
  2399. META.NOSEMANTICOVERLAP Semantics must not overlap
  2400. META.REQUIRED Required metadata missing.
  2401. META.SEMAKINDMATCHESNAME Semantic name must match system value, when defined.
  2402. META.SEMAKINDVALID Semantic kind must be valid
  2403. META.SEMANTICCOMPTYPE %0 must be %1.
  2404. META.SEMANTICINDEXMAX System value semantics have a maximum valid semantic index
  2405. META.SEMANTICLEN Semantic length must be at least 1 and at most 64.
  2406. META.SEMANTICSHOULDBEALLOCATED Semantic should have a valid packing location
  2407. META.SEMANTICSHOULDNOTBEALLOCATED Semantic should have a packing location of -1
  2408. META.SIGNATURECOMPTYPE signature %0 specifies unrecognized or invalid component type.
  2409. META.SIGNATUREDATAWIDTH Data width must be identical for all elements packed into the same row.
  2410. META.SIGNATUREILLEGALCOMPONENTORDER Component ordering for packed elements must be: arbitrary < system value < system generated value
  2411. META.SIGNATUREINDEXCONFLICT Only elements with compatible indexing rules may be packed together
  2412. META.SIGNATUREOUTOFRANGE Signature elements must fit within maximum signature size
  2413. META.SIGNATUREOVERLAP Signature elements may not overlap in packing location.
  2414. META.STRUCTBUFALIGNMENT StructuredBuffer stride not aligned
  2415. META.STRUCTBUFALIGNMENTOUTOFBOUND StructuredBuffer stride out of bounds
  2416. META.SYSTEMVALUEROWS System value may only have 1 row
  2417. META.TARGET Target triple must be 'dxil-ms-dx'
  2418. META.TESSELLATOROUTPUTPRIMITIVE Invalid Tessellator Output Primitive specified. Must be point, line, triangleCW or triangleCCW.
  2419. META.TESSELLATORPARTITION Invalid Tessellator Partitioning specified. Must be integer, pow2, fractional_odd or fractional_even.
  2420. META.TEXTURETYPE elements of typed buffers and textures must fit in four 32-bit quantities.
  2421. META.USED All metadata must be used by dxil.
  2422. META.VALIDSAMPLERMODE Invalid sampler mode on sampler .
  2423. META.VALUERANGE Metadata value must be within range.
  2424. META.VERSIONSUPPORTED Version in metadata must be supported.
  2425. META.WELLFORMED Metadata must be well-formed in operand count and types.
  2426. SM.64BITRAWBUFFERLOADSTORE i64/f64 rawBufferLoad/Store overloads are allowed after SM 6.3.
  2427. SM.AMPLIFICATIONSHADERPAYLOADSIZE For amplification shader with entry '%0', payload size %1 is greater than maximum size of %2 bytes.
  2428. SM.AMPLIFICATIONSHADERPAYLOADSIZEDECLARED For amplification shader with entry '%0', payload size %1 is greater than declared size of %2 bytes.
  2429. SM.APPENDANDCONSUMEONSAMEUAV BufferUpdateCounter inc and dec on a given UAV (%d) cannot both be in the same shader for shader model less than 5.1.
  2430. SM.CBUFFERARRAYOFFSETALIGNMENT CBuffer array offset must be aligned to 16-bytes
  2431. SM.CBUFFERELEMENTOVERFLOW CBuffer elements must not overflow
  2432. SM.CBUFFEROFFSETOVERLAP CBuffer offsets must not overlap
  2433. SM.CBUFFERSIZE CBuffer size must not exceed 65536 bytes
  2434. SM.CBUFFERTEMPLATETYPEMUSTBESTRUCT D3D12 constant/texture buffer template element can only be a struct.
  2435. SM.COMPLETEPOSITION Not all elements of SV_Position were written.
  2436. SM.CONSTANTINTERPMODE Interpolation mode must be constant for MS primitive output.
  2437. SM.COUNTERONLYONSTRUCTBUF BufferUpdateCounter valid only on structured buffers.
  2438. SM.CSNOSIGNATURES Compute shaders must not have shader signatures.
  2439. SM.DOMAINLOCATIONIDXOOB DomainLocation component index out of bounds for the domain.
  2440. SM.DSINPUTCONTROLPOINTCOUNTRANGE DS input control point count must be [0..%0]. %1 specified.
  2441. SM.DXILVERSION Target shader model requires specific Dxil Version
  2442. SM.GSINSTANCECOUNTRANGE GS instance count must be [1..%0]. %1 specified.
  2443. SM.GSOUTPUTVERTEXCOUNTRANGE GS output vertex count must be [0..%0]. %1 specified.
  2444. SM.GSTOTALOUTPUTVERTEXDATARANGE Declared output vertex count (%0) multiplied by the total number of declared scalar components of output data (%1) equals %2. This value cannot be greater than %3.
  2445. SM.GSVALIDINPUTPRIMITIVE GS input primitive unrecognized.
  2446. SM.GSVALIDOUTPUTPRIMITIVETOPOLOGY GS output primitive topology unrecognized.
  2447. SM.HSINPUTCONTROLPOINTCOUNTRANGE HS input control point count must be [0..%0]. %1 specified.
  2448. SM.HULLPASSTHRUCONTROLPOINTCOUNTMATCH For pass thru hull shader, input control point count must match output control point count
  2449. SM.INSIDETESSFACTORSIZEMATCHDOMAIN InsideTessFactor rows, columns (%0, %1) invalid for domain %2. Expected %3 rows and 1 column.
  2450. SM.INVALIDRESOURCECOMPTYPE Invalid resource return type.
  2451. SM.INVALIDRESOURCEKIND Invalid resources kind.
  2452. SM.INVALIDSAMPLERFEEDBACKTYPE Invalid sampler feedback type.
  2453. SM.INVALIDTEXTUREKINDONUAV Texture2DMS[Array] or TextureCube[Array] resources are not supported with UAVs.
  2454. SM.ISOLINEOUTPUTPRIMITIVEMISMATCH Hull Shader declared with IsoLine Domain must specify output primitive point or line. Triangle_cw or triangle_ccw output are not compatible with the IsoLine Domain.
  2455. SM.MAXMSSMSIZE Total Thread Group Shared Memory storage is %0, exceeded %1.
  2456. SM.MAXTGSMSIZE Total Thread Group Shared Memory storage is %0, exceeded %1.
  2457. SM.MAXTHEADGROUP Declared Thread Group Count %0 (X*Y*Z) is beyond the valid maximum of %1.
  2458. SM.MESHPSIGROWCOUNT For shader '%0', primitive output signatures are taking up more than %1 rows.
  2459. SM.MESHSHADERINOUTSIZE For shader '%0', payload plus output size is greater than %1.
  2460. SM.MESHSHADERMAXPRIMITIVECOUNT MS max primitive output count must be [0..%0]. %1 specified.
  2461. SM.MESHSHADERMAXVERTEXCOUNT MS max vertex output count must be [0..%0]. %1 specified.
  2462. SM.MESHSHADEROUTPUTSIZE For shader '%0', vertex plus primitive output size is greater than %1.
  2463. SM.MESHSHADERPAYLOADSIZE For mesh shader with entry '%0', payload size %1 is greater than maximum size of %2 bytes.
  2464. SM.MESHSHADERPAYLOADSIZEDECLARED For mesh shader with entry '%0', payload size %1 is greater than declared size of %2 bytes.
  2465. SM.MESHTOTALSIGROWCOUNT For shader '%0', vertex and primitive output signatures are taking up more than %1 rows.
  2466. SM.MESHVSIGROWCOUNT For shader '%0', vertex output signatures are taking up more than %1 rows.
  2467. SM.MULTISTREAMMUSTBEPOINT When multiple GS output streams are used they must be pointlists
  2468. SM.NAME Target shader model name must be known
  2469. SM.NOINTERPMODE Interpolation mode must be undefined for VS input/PS output/patch constant.
  2470. SM.NOPSOUTPUTIDX Pixel shader output registers are not indexable.
  2471. SM.OPCODE Opcode must be defined in target shader model
  2472. SM.OPCODEININVALIDFUNCTION Invalid DXIL opcode usage like StorePatchConstant in patch constant function
  2473. SM.OPERAND Operand must be defined in target shader model.
  2474. SM.OUTPUTCONTROLPOINTCOUNTRANGE output control point count must be [0..%0]. %1 specified.
  2475. SM.OUTPUTCONTROLPOINTSTOTALSCALARS Total number of scalars across all HS output control points must not exceed .
  2476. SM.PATCHCONSTANTONLYFORHSDS patch constant signature only valid in HS and DS.
  2477. SM.PSCONSISTENTINTERP Interpolation mode for PS input position must be linear_noperspective_centroid or linear_noperspective_sample when outputting oDepthGE or oDepthLE and not running at sample frequency (which is forced by inputting SV_SampleIndex or declaring an input linear_sample or linear_noperspective_sample).
  2478. SM.PSCOVERAGEANDINNERCOVERAGE InnerCoverage and Coverage are mutually exclusive.
  2479. SM.PSMULTIPLEDEPTHSEMANTIC Pixel Shader only allows one type of depth semantic to be declared.
  2480. SM.PSOUTPUTSEMANTIC Pixel Shader allows output semantics to be SV_Target, SV_Depth, SV_DepthGreaterEqual, SV_DepthLessEqual, SV_Coverage or SV_StencilRef, %0 found.
  2481. SM.PSTARGETCOL0 SV_Target packed location must start at column 0.
  2482. SM.PSTARGETINDEXMATCHESROW SV_Target semantic index must match packed row location.
  2483. SM.RAYSHADERPAYLOADSIZE For shader '%0', %1 size is smaller than argument's allocation size.
  2484. SM.RAYSHADERSIGNATURES Ray tracing shader '%0' should not have any shader signatures.
  2485. SM.RESOURCERANGEOVERLAP Resource ranges must not overlap
  2486. SM.ROVONLYINPS RasterizerOrdered objects are only allowed in 5.0+ pixel shaders.
  2487. SM.SAMPLECOUNTONLYON2DMS Only Texture2DMS/2DMSArray could has sample count.
  2488. SM.SEMANTIC Semantic must be defined in target shader model
  2489. SM.STREAMINDEXRANGE Stream index (%0) must between 0 and %1.
  2490. SM.TESSFACTORFORDOMAIN Required TessFactor for domain not found declared anywhere in Patch Constant data.
  2491. SM.TESSFACTORSIZEMATCHDOMAIN TessFactor rows, columns (%0, %1) invalid for domain %2. Expected %3 rows and 1 column.
  2492. SM.TGSMUNSUPPORTED Thread Group Shared Memory not supported %0.
  2493. SM.THREADGROUPCHANNELRANGE Declared Thread Group %0 size %1 outside valid range [%2..%3].
  2494. SM.TRIOUTPUTPRIMITIVEMISMATCH Hull Shader declared with Tri Domain must specify output primitive point, triangle_cw or triangle_ccw. Line output is not compatible with the Tri domain.
  2495. SM.UNDEFINEDOUTPUT Not all elements of output %0 were written.
  2496. SM.VALIDDOMAIN Invalid Tessellator Domain specified. Must be isoline, tri or quad.
  2497. SM.VIEWIDNEEDSSLOT ViewID requires compatible space in pixel shader input signature
  2498. SM.WAVESIZENEEDSDXIL16PLUS WaveSize is valid only for DXIL version 1.6 and higher.
  2499. SM.WAVESIZEVALUE Declared WaveSize %0 outside valid range [%1..%2], or not a power of 2.
  2500. SM.ZEROHSINPUTCONTROLPOINTWITHINPUT When HS input control point count is 0, no input signature should exist.
  2501. TYPES.DEFINED Type must be defined based on DXIL primitives
  2502. TYPES.I8 I8 can only be used as immediate value for intrinsic or as i8* via bitcast by lifetime intrinsics.
  2503. TYPES.INTWIDTH Int type must be of valid width
  2504. TYPES.NOMULTIDIM Only one dimension allowed for array type.
  2505. TYPES.NOPTRTOPTR Pointers to pointers, or pointers in structures are not allowed.
  2506. TYPES.NOVECTOR Vector types must not be present
  2507. ========================================= ========================================================================================================================================================================================================================================================================================================
  2508. .. VALRULES-RST:END
  2509. Modules and Linking
  2510. ===================
  2511. HLSL has linking capabilities to enable third-party libraries. The linking step happens before shader DXIL is given to the driver compilers.
  2512. Experimental library generation is added in DXIL1.1. A library could be created by compile with lib_6_1 profile.
  2513. A library is a dxil container like the compile result of other shader profiles. The difference is library will keep information for linking like resource link info and entry function signatures.
  2514. Library support is not part of DXIL spec. Only requirement is linked shader must be valid DXIL.
  2515. Additional Notes
  2516. ================
  2517. These additional notes are not normative for DXIL, and are included for the convenience of implementers.
  2518. Other Versioned Components
  2519. --------------------------
  2520. In addition to shader model, DXIL and bitcode representation versions, two other interesting versioned components are discussed: the supporting operating system and runtime, and the HLSL language.
  2521. Support is provided in the Microsoft Windows family of operating systems, when running on the D3D12 runtime.
  2522. The HLSL language is versioned independently of DXIL, and currently follows an 'HLSL <year>' naming scheme. HLSL 2015 is the dialect supported by the d3dcompiler_47 library; a limited form of support is provided in the open source HLSL on LLVM project. HLSL 2016 is the version supported by the current HLSL on LLVM project, which removes some features (primarily effect framework syntax, backquote operator) and adds new ones (wave intrinsics and basic i64 support).
  2523. .. _dxil_container_format:
  2524. DXIL Container Format
  2525. ---------------------
  2526. DXIL is typically encapsulated in a DXIL container. A DXIL container is composed of a header, a sequence of part lengths, and a sequence of parts.
  2527. The following C declaration describes this structure::
  2528. struct DxilContainerHeader {
  2529. uint32_t HeaderFourCC;
  2530. uint8_t Digest[DxilContainerHashSize];
  2531. uint16_t MajorVersion;
  2532. uint16_t MinorVersion;
  2533. uint32_t ContainerSizeInBytes; // From start of this header
  2534. uint32_t PartCount;
  2535. // Structure is followed by uint32_t PartOffset[PartCount];
  2536. // The offset is to a DxilPartHeader.
  2537. };
  2538. Each part has a standard header, followed by a part-specify body::
  2539. struct DxilPartHeader {
  2540. uint32_t PartFourCC; // Four char code for part type.
  2541. uint32_t PartSize; // Byte count for PartData.
  2542. // Structure is followed by uint8_t PartData[PartSize].
  2543. };
  2544. The DXIL program is found in a part with the following body::
  2545. struct DxilProgramHeader {
  2546. uint32_t ProgramVersion; /// Major and minor version of shader, including type.
  2547. uint32_t SizeInUint32; /// Size in uint32_t units including this header.
  2548. uint32_t DxilMagic; // 0x4C495844, ASCII "DXIL".
  2549. uint32_t DxilVersion; // DXIL version.
  2550. uint32_t BitcodeOffset; // Offset to LLVM bitcode (from DxilMagic).
  2551. uint32_t BitcodeSize; // Size of LLVM bitcode.
  2552. // Followed by uint8_t[BitcodeHeader.BitcodeSize] after possible gap from BitcodeOffset
  2553. };
  2554. The bitcode payload is defined as per bitcode encoding.
  2555. Future Directions
  2556. -----------------
  2557. This section provides background on future directions for DXIL that may or may not materialize. They imply a new version of DXIL.
  2558. It's desirable to support generic pointers, pointing to one of other kinds of pointers. If the compiler fails to disambiguate, memory access is done via a generic pointer; the HLSL compiler will warn the user about each access that it cannot disambiguate. Not supported for SM6.
  2559. HLSL will eventually support more primitive types such as i8, i16, i32, i64, half, float, double, as well as declspec(align(n)) and #pragma pack(n) directives. SM6.0 will eventually require byte-granularity access support in hardware, especially writes. Not supported for SM6.
  2560. There will be a Requires32BitAlignedAccesses CAP flag. If absent, this would indicate that the shader requires writes that (1) do not write full four bytes, or (2) are not aligned on four-byte boundary. If hardware does not natively support these, the shader is rejected. Programmers can work around this hardware limitation by manually aligning smaller data on four-byte boundary in HLSL.
  2561. When libraries are supported as first-class DXIL constructs, "lib_*" shader models can specify more than one entry point per module; the other shader models must specify exactly one entry point.
  2562. The target machine specification for HLSL might specify a 64-bit pointer side with 64-bit offsets.
  2563. Hardware support for generic pointer is essential for HLSL next as a fallback mechanism for cases when compiler cannot disambiguate pointer's address space.
  2564. Future DXIL will change how half and i16 are treated:
  2565. * i16 will have to be supported natively either in hardware or via emulation,
  2566. * half's behavior will depend on the value of RequiresHardwareHalf CAP; if it's not set, half can be treated as min-precision type (min16float); i.e., computation may be done with values implicitly promoted to floats; if it's set and hardware does not support half type natively, the driver compiler can either emulate exact IEEE half behavior or fail shader creation.
  2567. Pending Specification Work
  2568. ==========================
  2569. The following work on this specification is still pending:
  2570. * Consider moving some additional tables and lists into hctdb and cross-reference.
  2571. * Complete the extended documentation for instructions.