hctdb_inst_docs.txt 29 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649
  1. # Extended documenation for DXIL instructions.
  2. #
  3. # File format:
  4. # * Inst: [instruction name] - [brief description]
  5. # further remarks
  6. #
  7. # Keep these ordered alphabetically for ease of maintenance.
  8. #
  9. # Dump instructions with no extra documentation with this snippet.
  10. # import hctdb
  11. # h = hctdb.db_dxil()
  12. # for i in [item.name for item in h.instr if item.is_dxil_op and not item.remarks]: print(i)
  13. * Inst: Acos - Returns the arccosine of the specified value. Input should be a floating-point value within the range of -1 to 1.
  14. The return value is within the range of -PI/2 to PI/2.
  15. +----------+------+--------------+---------+------+------+---------+------+-----+
  16. | src | -inf | [-1,1] | -denorm | -0 | +0 | +denorm | +inf | NaN |
  17. +----------+------+--------------+---------+------+------+---------+------+-----+
  18. | acos(src)| NaN | (-PI/2,+PI/2)| PI/2 | PI/2 | PI/2 | PI/2 | NaN | NaN |
  19. +----------+------+--------------+---------+------+------+---------+------+-----+
  20. * Inst: Asin - Returns the arccosine of the specified value. Input should be a floating-point value within the range of -1 to 1
  21. The return value is within the range of -PI/2 to PI/2.
  22. +----------+------+--------------+---------+------+------+---------+------+-----+
  23. | src | -inf | [-1,1] | -denorm | -0 | +0 | +denorm | +inf | NaN |
  24. +----------+------+--------------+---------+------+------+---------+------+-----+
  25. | asin(src)| NaN | (-PI/2,+PI/2)| 0 | 0 | 0 | 0 | NaN | NaN |
  26. +----------+------+--------------+---------+------+------+---------+------+-----+
  27. * Inst: Atan - Returns the arctangent of the specified value. The return value is within the range of -PI/2 to PI/2.
  28. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  29. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F |+inf | NaN |
  30. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  31. | atan(src)| -PI/2| (-PI/2,+PI/2)| 0 | 0 | 0 | 0 | (-PI/2,+PI/2) |PI/2 | NaN |
  32. +----------+------+--------------+---------+------+------+---------+---------------+-----+-----+
  33. Returns the arctangent of the specified value. The return value is within the range of -PI/2 to PI/2
  34. * Inst: Bfrev - Reverses the order of the bits.
  35. Reverses the order of the bits. For example given 0x12345678 the result would be 0x1e6a2c48.
  36. * Inst: Bfi - Given a bit range from the LSB of a number, places that number of bits in another number at any offset
  37. Given a bit range from the LSB of a number, place that number of bits in another number at any offset.
  38. dst = Bfi(src0, src1, src2, src3);
  39. The LSB 5 bits of src0 provide the bitfield width (0-31) to take from src2.
  40. The LSB 5 bits of src1 provide the bitfield offset (0-31) to start replacing bits in the number read from src3.
  41. Given width, offset: bitmask = (((1 << width)-1) << offset) & 0xffffffff, dest = ((src2 << offset) & bitmask) | (src3 & ~bitmask)
  42. * Inst: Cos - returns cosine(theta) for theta in radians.
  43. Theta values can be any IEEE 32-bit floating point values.
  44. The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
  45. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  46. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  47. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  48. | cos(src) | NaN | [-1 to +1] | +1 | +1 | +1 | +1 | [-1 to +1] | NaN | NaN |
  49. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  50. * Inst: Countbits - Counts the number of bits in the input integer.
  51. Counts the number of bits in the input integer.
  52. * Inst: DerivCoarseX - computes the rate of change per stamp in x direction.
  53. dst = DerivCoarseX(src);
  54. Computes the rate of change per stamp in x direction. Only a single x derivative pair is computed for each 2x2 stamp of pixels.
  55. The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
  56. As an example, the x derivative could be a delta from the top row of pixels.
  57. The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  58. * Inst: DerivCoarseY - computes the rate of change per stamp in y direction.
  59. dst = DerivCoarseY(src);
  60. Computes the rate of change per stamp in y direction. Only a single y derivative pair is computed for each 2x2 stamp of pixels.
  61. The data in the current Pixel Shader invocation may or may not participate in the calculation of the requested derivative, given the derivative will be calculated only once per 2x2 quad:
  62. As an example, the y derivative could be a delta from the left column of pixels.
  63. The exact calculation is up to the hardware vendor. There is also no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  64. * Inst: DerivFineX - computes the rate of change per pixel in x direction.
  65. dst = DerivFineX(src);
  66. Computes the rate of change per pixel in x direction. Each pixel in the 2x2 stamp gets a unique pair of x derivative calculations
  67. The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
  68. There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  69. * Inst: DerivFineY - computes the rate of change per pixel in y direction.
  70. dst = DerivFineY(src);
  71. Computes the rate of change per pixel in y direction. Each pixel in the 2x2 stamp gets a unique pair of y derivative calculations
  72. The data in the current Pixel Shader invocation always participates in the calculation of the requested derivative.
  73. There is no specification dictating how the 2x2 quads will be aligned/tiled over a primitive.
  74. * Inst: Dot2 - Two-dimensional vector dot-product
  75. Two-dimensional vector dot-product
  76. * Inst: Dot3 - Three-dimensional vector dot-product
  77. Three-dimensional vector dot-product
  78. * Inst: Dot4 - Four-dimensional vector dot-product
  79. Four-dimensional vector dot-product
  80. * Inst: Exp - returns 2^exponent
  81. Returns 2^exponent. Note that hlsl log intrinsic returns the base-e exponent. Maximum relative error is e^-21.
  82. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  83. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  84. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  85. | exp(src) | 0 | +F | 1 | 1 | 1 | 1 | +F | +inf | NaN |
  86. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  87. * Inst: FAbs - returns the absolute value of the input value.
  88. The FAbs instruction takes simply forces the sign of the number(s) on the source operand positive, including on INF and denorm values.
  89. Applying FAbs on NaN preserves NaN, although the particular NaN bit pattern that results is not defined.
  90. * Inst: FirstbitHi - Returns the location of the first set bit starting from the highest order bit and working downward.
  91. Returns the integer position of the first bit set in the 32-bit input starting from the MSB. For example, 0x10000000 would return 3. Returns 0xffffffff if no match was found.
  92. * Inst: FirstbitLo - Returns the location of the first set bit starting from the lowest order bit and working upward.
  93. Returns the integer position of the first bit set in the 32-bit input starting from the LSB. For example, 0x00000000 would return 1. Returns 0xffffffff if no match was found.
  94. * Inst: FirstbitSHi - Returns the location of the first set bit from the highest order bit based on the sign.
  95. Returns the first 0 from the MSB if the number is negative, else the first 1 from the MSB. Returns 0xffffffff if no match was found.
  96. * Inst: Fma - fused multiply-add
  97. Fused multiply-add. This operation is only defined in double precision.
  98. Fma(a,b,c) = a * b + c
  99. * Inst: FMad - floating point multiply & add
  100. Floating point multiply & add. This operation is not fused for "precise" operations.
  101. FMad(a,b,c) = a * b + c
  102. * Inst: FMax - returns a if a >= b, else b
  103. >= is used instead of > so that if min(x,y) = x then max(x,y) = y.
  104. NaN has special handling: If one source operand is NaN, then the other source operand is returned.
  105. If both are NaN, any NaN representation is returned.
  106. This conforms to new IEEE 754R rules.
  107. Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
  108. +------+-----------------------------+
  109. | a | b |
  110. | +------+--------+------+------+
  111. | | -inf | F | +inf | NaN |
  112. +------+------+--------+------+------+
  113. | -inf | -inf | b | +inf | -inf |
  114. +------+------+--------+------+------+
  115. | F | a | a or b | +inf | a |
  116. +------+------+--------+------+------+
  117. | +inf | +inf | +inf | +inf | +inf |
  118. +------+------+--------+------+------+
  119. | NaN | -inf | b | +inf | NaN |
  120. +------+------+--------+------+------+
  121. * Inst: FMin - returns a if a < b, else b
  122. NaN has special handling: If one source operand is NaN, then the other source operand is returned.
  123. If both are NaN, any NaN representation is returned.
  124. This conforms to new IEEE 754R rules.
  125. Denorms are flushed (sign preserved) before comparison, however the result written to dest may or may not be denorm flushed.
  126. +------+-----------------------------+
  127. | a | b |
  128. | +------+--------+------+------+
  129. | | -inf | F | +inf | NaN |
  130. +------+------+--------+------+------+
  131. | -inf | -inf | -inf | -inf | -inf |
  132. +------+------+--------+------+------+
  133. | F | -inf | a or b | a | a |
  134. +------+------+--------+------+------+
  135. | +inf | -inf | b | +inf | +inf |
  136. +------+------+--------+------+------+
  137. | NaN | -inf | b | +inf | NaN |
  138. +------+------+--------+------+------+
  139. * Inst: Frc - extract fracitonal component.
  140. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  141. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  142. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  143. | log(src) | NaN |[+0,1)| +0 | +0 | +0 | +0 | [+0,1) | NaN | NaN |
  144. +--------------+------+------+---------+----+----+---------+--------+------+-----+
  145. * Inst: Hcos - returns the hyperbolic cosine of the specified value.
  146. Returns the hyperbolic cosine of the specified value.
  147. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  148. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  149. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  150. | hcos(src)| +inf | (1, +inf) | +1 | +1 | +1 | +1 | (1, +inf) | +inf | NaN |
  151. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  152. * Inst: Hsin - returns the hyperbolic sine of the specified value.
  153. Returns the hyperbolic sine of the specified value.
  154. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  155. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  156. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  157. | hsin(src)| -inf | -F | 0 | 0 | 0 | 0 | +F | +inf | NaN |
  158. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  159. * Inst: Htan - returns the hyperbolic tangent of the specified value.
  160. Returns the hyperbolic tangent of the specified value.
  161. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  162. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  163. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  164. | htan(src)| -1 | -F | 0 | 0 | 0 | 0 | +F | +1 | NaN |
  165. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  166. * Inst: Ibfe - Integer bitfield extract
  167. dest = Ibfe(src0, src1, src2)
  168. Given a range of bits in a number, shift those bits to the LSB and sign extend the MSB of the range.
  169. width : The LSB 5 bits of src0 (0-31).
  170. offset: The LSB 5 bits of src1 (0-31)
  171. * BLOCK-BEGIN
  172. .. code:: c
  173. if( width == 0 )
  174. {
  175. dest = 0
  176. }
  177. else if( width + offset < 32 )
  178. {
  179. shl dest, src2, 32-(width+offset)
  180. ishr dest, dest, 32-width
  181. }
  182. else
  183. {
  184. ishr dest, src2, offset
  185. }
  186. * BLOCK-END
  187. * Inst: IMad - Signed integer multiply & add
  188. Signed integer multiply & add
  189. IMad(a,b,c) = a * b + c
  190. * Inst: IMax - IMax(a,b) returns a if a > b, else b
  191. IMax(a,b) returns a if a > b, else b. Optional negate modifier on source operands takes 2's complement before performing operation.
  192. * Inst: IMin - IMin(a,b) returns a if a < b, else b
  193. IMin(a,b) returns a if a < b, else b. Optional negate modifier on source operands takes 2's complement before performing operation.
  194. * Inst: IMul - multiply of 32-bit operands to produce the correct full 64-bit result.
  195. IMul(src0, src1) = destHi, destLo
  196. multiply of 32-bit operands src0 and src1 (note they are signed), producing the correct full 64-bit result.
  197. The low 32 bits are placed in destLO. The high 32 bits are placed in destHI.
  198. Either of destHI or destLO may be specified as NULL instead of specifying a register, in the case high or low 32 bits of the 64-bit result are not needed.
  199. Optional negate modifier on source operands takes 2's complement before performing arithmetic operation.
  200. * Inst: IsFinite - Returns true if x is finite, false otherwise.
  201. Returns true if x is finite, false otherwise.
  202. * Inst: IsInf - Returns true if x is +INF or -INF, false otherwise.
  203. Returns true if x is +INF or -INF, false otherwise.
  204. * Inst: IsNaN - Returns true if x is NAN or QNAN, false otherwise.
  205. Returns true if x is NAN or QNAN, false otherwise.
  206. * Inst: IsNormal - returns IsNormal
  207. Returns IsNormal.
  208. * Inst: Log - returns log base 2.
  209. Returns log base 2. Note that hlsl log intrinsic returns natural log.
  210. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  211. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  212. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  213. | log(src) | NaN | NaN | -inf |-inf|-inf| -inf | F | +inf | NaN |
  214. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  215. * Inst: LoadInput - Loads the value from shader input
  216. Loads the value from shader input
  217. * Inst: MinPrecXRegLoad - Helper load operation for minprecision
  218. Helper load operation for minprecision
  219. * Inst: MinPrecXRegStore - Helper store operation for minprecision
  220. Helper store operation for minprecision
  221. * Inst: Msad - masked Sum of Absolute Differences.
  222. Returns the masked Sum of Absolute Differences.
  223. dest = msad(ref, src, accum)
  224. ref: contains 4 packed 8-bit unsigned integers in 32 bits.
  225. src: contains 4 packed 8-bit unsigned integers in 32 bits.
  226. accum: a 32-bit unsigned integer, providing an existing accumulation.
  227. dest receives the result of the masked SAD operation added to the accumulation value.
  228. * BLOCK-BEGIN
  229. .. code:: c
  230. UINT msad( UINT ref, UINT src, UINT accum )
  231. {
  232. for (UINT i = 0; i < 4; i++)
  233. {
  234. BYTE refByte, srcByte, absDiff;
  235. refByte = (BYTE)(ref >> (i * 8));
  236. if (!refByte)
  237. {
  238. continue;
  239. }
  240. srcByte = (BYTE)(src >> (i * 8));
  241. if (refByte >= srcByte)
  242. {
  243. absDiff = refByte - srcByte;
  244. }
  245. else
  246. {
  247. absDiff = srcByte - refByte;
  248. }
  249. // The recommended overflow behavior for MSAD is
  250. // to do a 32-bit saturate. This is not
  251. // required, however, and wrapping is allowed.
  252. // So from an application point of view,
  253. // overflow behavior is undefined.
  254. if (UINT_MAX - accum < absDiff)
  255. {
  256. accum = UINT_MAX;
  257. break;
  258. }
  259. accum += absDiff;
  260. }
  261. return accum;
  262. }
  263. * BLOCK-END
  264. * Inst: Round_ne - floating-point round to integral float.
  265. Floating-point round of the values in src,
  266. writing integral floating-point values to dest.
  267. round_ne rounds towards nearest even. For halfway, it rounds away from zero.
  268. +--------------+------+----+---------+----+----+---------+----+------+-----+
  269. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  270. +--------------+------+----+---------+----+----+---------+----+------+-----+
  271. | round_ne(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  272. +--------------+------+----+---------+----+----+---------+----+------+-----+
  273. * Inst: Round_ni - floating-point round to integral float.
  274. Floating-point round of the values in src,
  275. writing integral floating-point values to dest.
  276. round_ni rounds towards -INF, commonly known as floor().
  277. +--------------+------+----+---------+----+----+---------+----+------+-----+
  278. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  279. +--------------+------+----+---------+----+----+---------+----+------+-----+
  280. | round_ni(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  281. +--------------+------+----+---------+----+----+---------+----+------+-----+
  282. * Inst: Round_pi - floating-point round to integral float.
  283. Floating-point round of the values in src,
  284. writing integral floating-point values to dest.
  285. round_pi rounds towards +INF, commonly known as ceil().
  286. +--------------+------+----+---------+----+----+---------+----+------+-----+
  287. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  288. +--------------+------+----+---------+----+----+---------+----+------+-----+
  289. | round_pi(src)| -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  290. +--------------+------+----+---------+----+----+---------+----+------+-----+
  291. * Inst: Round_z - floating-point round to integral float.
  292. Floating-point round of the values in src,
  293. writing integral floating-point values to dest.
  294. round_z rounds towards zero.
  295. +--------------+------+----+---------+----+----+---------+----+------+-----+
  296. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  297. +--------------+------+----+---------+----+----+---------+----+------+-----+
  298. | round_z(src) | -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  299. +--------------+------+----+---------+----+----+---------+----+------+-----+
  300. * Inst: Rsqrt- returns reciprocal square root (1 / sqrt(src)
  301. Maximum relative error is 2^21.
  302. +--------------+------+----+---------+----+----+---------+----+------+-----+
  303. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  304. +--------------+------+----+---------+----+----+---------+----+------+-----+
  305. | rsqrt(src) | -inf | -F | -0 | -0 | +0 | +0 | +F | +inf | NaN |
  306. +--------------+------+----+---------+----+----+---------+----+------+-----+
  307. * Inst: Saturate - clamps the result of a single or double precision floating point value to [0.0f...1.0f]
  308. The Saturate instruction performs the following operation on its input value:
  309. min(1.0f, max(0.0f, value))
  310. where min() and max() in the above expression behave in the way Min and Max behave.
  311. Saturate(NaN) returns 0, by the rules for min and max.
  312. * Inst: Sin - returns sine(theta) for theta in radians.
  313. Theta values can be any IEEE 32-bit floating point values.
  314. The maximum absolute error is 0.0008 in the interval from -100*Pi to +100*Pi.
  315. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  316. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  317. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  318. | sin(src) | NaN | [-1 to +1] | -0 | -0 | +0 | +0 | [-1 to +1] | NaN | NaN |
  319. +----------+------+------------+---------+----+----+---------+------------+------+-----+
  320. * Inst: Sqrt - returns square root
  321. Precision is 1 ulp.
  322. +--------------+------+----+---------+----+----+---------+----+------+-----+
  323. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  324. +--------------+------+----+---------+----+----+---------+----+------+-----+
  325. | sqrt(src) | NaN | NaN| -0 | -0 | +0 | +0 | +F | +inf | NaN |
  326. +--------------+------+----+---------+----+----+---------+----+------+-----+
  327. * Inst: StoreOutput - Stores the value to shader output
  328. Stores the value to shader output
  329. * Inst: Tan - returns tan(theta) for theta in radians.
  330. Theta values can be any IEEE 32-bit floating point values.
  331. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  332. | src | -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  333. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  334. | tan(src) | NaN | [-inf to +inf] | -0 | -0 | +0 | +0 | [-inf to +inf] | NaN | NaN |
  335. +----------+----------+----------------+---------+----+----+---------+----------------+------+-----+
  336. * Inst: TempRegLoad - Helper load operation
  337. Helper load operation
  338. * Inst: TempRegStore - Helper store operation
  339. Helper store operation
  340. * Inst: UAddc - unsigned add of 32-bit operand with the carry
  341. dest0, dest1 = UAddc(src0, src1)
  342. unsigned add of 32-bit operands src0 and src1, placing the LSB part of the 32-bit result in dest0.
  343. dest1 is written with: 1 if a carry is produced, 0 otherwise. Dest1 can be NULL if the carry is not needed
  344. * Inst: Ubfe - Unsigned integer bitfield extract
  345. dest = ubfe(src0, src1, src2)
  346. Given a range of bits in a number, shift those bits to the LSB and set remaining bits to 0.
  347. width : The LSB 5 bits of src0 (0-31).
  348. offset: The LSB 5 bits of src1 (0-31).
  349. Given width, offset:
  350. * BLOCK-BEGIN
  351. .. code:: c
  352. if( width == 0 )
  353. {
  354. dest = 0
  355. }
  356. else if( width + offset < 32 )
  357. {
  358. shl dest, src2, 32-(width+offset)
  359. ushr dest, dest, 32-width
  360. }
  361. else
  362. {
  363. ushr dest, src2, offset
  364. }
  365. * BLOCK-END
  366. * Inst: UDiv - unsigned divide of the 32-bit operand src0 by the 32-bit operand src1.
  367. destQUOT, destREM = UDiv(src0, src1);
  368. unsigned divide of the 32-bit operand src0 by the 32-bit operand src1.
  369. The results of the divides are the 32-bit quotients (placed in destQUOT) and 32-bit remainders (placed in destREM).
  370. Divide by zero returns 0xffffffff for both quotient and remainder.
  371. Either destQUOT or destREM may be specified as NULL instead of specifying a register, in the case the quotient or remainder are not needed.
  372. Unsigned subtract of 32-bit operands src1 from src0, placing the LSB part of the 32-bit result in dest0.
  373. dest1 is written with: 1 if a borrow is produced, 0 otherwise. Dest1 can be NULL if the borrow is not needed
  374. * Inst: UMad - Unsigned integer multiply & add
  375. Unsigned integer multiply & add.
  376. Umad(a,b,c) = a * b + c
  377. * Inst: UMax - unsigned integer maximum. UMax(a,b) = a > b ? a : b
  378. unsigned integer maximum. UMax(a,b) = a > b ? a : b
  379. * Inst: UMin - unsigned integer minimum. UMin(a,b) = a < b ? a : b
  380. unsigned integer minimum. UMin(a,b) = a < b ? a : b
  381. * Inst: UMul - multiply of 32-bit operands to produce the correct full 64-bit result.
  382. multiply of 32-bit operands src0 and src1 (note they are unsigned), producing the correct full 64-bit result.
  383. The low 32 bits are placed in destLO. The high 32 bits are placed in destHI.
  384. Either of destHI or destLO may be specified as NULL instead of specifying a register, in the case high or low 32 bits of the 64-bit result are not needed
  385. * Inst: USubb - unsigned subtract of 32-bit operands with the borrow
  386. dest0, dest1 = USubb(src0, src1)
  387. * Inst: AttributeAtVertex - returns the values of the attributes at the vertex.
  388. returns the values of the attributes at the vertex. VertexID ranges from 0 to 2.
  389. * Inst: FDiv - returns the quotient of its two operands
  390. %dest = fdiv float %src0, %src1
  391. The following table shows the results obtained when executing the instruction with various classes of numbers, assuming that fast math flag is not used and "fp32-denorm-mode"="preserve".
  392. When "fp32-denorm-mode"="ftz", denorm inputs should be interpreted as corresponding signed zero, and any resulting denorm is also flushed to zero.
  393. When fast math is enabled, implementation may use reciprocal form: src0*(1/src1). This may result in evaluating src0*(+/-)INF from src0*(1/(+/-)denorm). This may produce NaN in some cases or (+/-)INF in others.
  394. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  395. | src0\\src1| -inf | -F | -1 | -denorm | -0 | +0 | +denorm | +1 | +F | +inf | NaN |
  396. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  397. | -inf | NaN | +inf | +inf | +inf |+inf|-inf| -inf | -inf | -inf | NaN | NaN |
  398. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  399. | -F | +0 | +F | -src0 | +F |+inf|-inf| -F | src0 | -F | -0 | NaN |
  400. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  401. | -denorm | +0 | +denorm| -src0 | +F |+inf|-inf| -F | src0 |-denorm | -0 | NaN |
  402. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  403. | -0 | +0 | +0 | +0 | 0 |NaN |NaN | 0 | -0 | -0 | -0 | NaN |
  404. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  405. | +0 | -0 | -0 | -0 | 0 |NaN |NaN | 0 | +0 | +0 | +0 | NaN |
  406. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  407. | +denorm | -0 | -denorm| -src0 | -F |-inf|+inf| +F | src0 |+denorm | +0 | NaN |
  408. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  409. | +F | -0 | -F | -src0 | -F |-inf|+inf| +F | src0 | +F | +0 | NaN |
  410. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  411. | +inf | NaN | -inf | -inf | -inf |-inf|+inf| +inf | +inf | +inf | NaN | NaN |
  412. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  413. | NaN | NaN | NaN | NaN | NaN |NaN |NaN | NaN | NaN | NaN | NaN | NaN |
  414. +-----------+----------+--------+-------+---------+----+----+---------+-------+--------+------+-----+
  415. * Inst: FAdd - component-wise add
  416. %des = fadd float %src0, %src1
  417. The following table shows the results obtained when executing the instruction with various classes of numbers, assuming that "fp32-denorm-mode"="preserve".
  418. For "fp32-denorm-mode"="ftz" mode, denorms inputs should be treated as corresponding signed zero, and any resulting denorm is also flushed to zero.
  419. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  420. | src0\src1| -inf | -F | -denorm | -0 | +0 | +denorm | +F | +inf | NaN |
  421. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  422. | -inf | -inf | -inf | -inf |-inf|-inf| -inf | -inf | NaN | NaN |
  423. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  424. | -F | -inf | -F | -F |src0|src0| -F | +/-F | +inf | NaN |
  425. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  426. | -denorm | -inf | -F |-F/denorm |src0|src0| +/-denorm | +F | +inf | NaN |
  427. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  428. | -0 | -inf | src1 | src1 |-0 |+0 | src1 | src1 | +inf | NaN |
  429. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  430. | +0 | -inf | src1 | src1 |-0 |+0 | src1 | src1 | +inf | NaN |
  431. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  432. | +denorm | -inf | -F |+/-denorm |src0|src0| +F/denorm | +F | +inf | NaN |
  433. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  434. | +F | -inf | +/-F | +F |src0|src0| +F | +F | +inf | NaN |
  435. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  436. | +inf | NaN | +inf | +inf |+inf|+inf| +inf | +inf | +inf | NaN |
  437. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+
  438. | NaN | NaN | NaN | NaN |NaN |NaN | NaN | NaN | NaN | NaN |
  439. +----------+----------+--------+----------+----+----+-----------+--------+------+-----+