9 tháng trước cách đây · ba6224b61a
--- a/core/simd/simd.odin
+++ b/core/simd/simd.odin
@@ -3,7 +3,7 @@ The SIMD support package.
 
				 
			
 
				 SIMD (Single Instruction Multiple Data), is a CPU hardware feature that
			
 
				 introduce special registers and instructions which operate on multiple units
			
 
				-of data at the same time, , which enables faster data processing for
			
 
				+of data at the same time which enables faster data processing for
			
 
				 applications with heavy computational workloads.
			
 
				 
			
 
				 In Odin SIMD is exposed via a special kinds of arrays, called the *SIMD
			
@@ -11,7 +11,7 @@ vectors*. The types of SIMD vectors is written as `#simd [N]T`, where N is a
 
				 power of two, and T could be any basic type (integers, floats, etc.). The
			
 
				 documentation of this package will call *SIMD vectors* just *vectors*.
			
 
				 
			
 
				-The elements of SIMD vectors consist of elements, called *scalar values*, or
			
 
				+SIMD vectors consist of elements, called *scalar values*, or
			
 
				 *scalars*, each occupying a *lane* of the SIMD vector.
			
 
				 
			
 
				 This package implements procedures for working with vectors.
			
@@ -24,9 +24,9 @@ import "base:intrinsics"
 
				 /*
			
 
				 Check if SIMD is emulated on a target platform.
			
 
				 
			
 
				-This value is `true`, if the compile-time target has the hardware support for
			
 
				+This value is `false`, if the compile-time target has the hardware support for
			
 
				 at 128-bit (or wider) SIMD. If the compile-time target lacks the hardware support
			
 
				-for 128-bit SIMD, this value is `false`, and all SIMD operations will be
			
 
				+for 128-bit SIMD, this value is `true`, and all SIMD operations will likely be
			
 
				 emulated.
			
 
				 */
			
 
				 IS_EMULATED :: true when (ODIN_ARCH == .amd64 || ODIN_ARCH == .i386) && !intrinsics.has_target_feature("sse2") else
			
@@ -408,7 +408,7 @@ div :: intrinsics.simd_div
 
				 Shift left lanes of a vector.
			
 
				 
			
 
				 This procedure returns a vector, such that each lane holds the result of a
			
 
				-shift-left (aka shift-up) operation, of lane from the vector `a` by the shift
			
 
				+shift-left (aka shift-up) operation of the corresponding lane from vector `a` by the shift
			
 
				 amount from the corresponding lane of the vector `b`.
			
 
				 
			
 
				 If the shift amount is greater than the bit-width of a lane, the result is `0`
			
@@ -481,7 +481,7 @@ Result:
 
				 
			
 
				 Example:
			
 
				 
			
 
				-This example assumes that the `a` vector is of a signed type.
			
 
				+This example assumes that the `a` vector is of a signed 32 bit type.
			
 
				 
			
 
				 	   +-------+-------+-------+-------+
			
 
				 	a: |  0x11 |  0x55 |  0x03 |  0xff |
			
@@ -581,10 +581,10 @@ This example assumes that the `a` vector is of a signed type.
 
				 shr_masked :: intrinsics.simd_shr_masked
			
 
				 
			
 
				 /*
			
 
				-Saturated addition of SIMD vectors.
			
 
				+Saturated addition of vectors.
			
 
				 
			
 
				 The *saturated sum* is a sum, that upon overflow or underflow, instead of
			
 
				-round-tripping, keeps the value clamped between the minimum and the maximum
			
 
				+wrapping, keeps the value clamped between the minimum and the maximum
			
 
				 values of the lane type.
			
 
				 
			
 
				 This procedure returns a vector where each lane is the saturated sum of the
			
@@ -629,7 +629,7 @@ Assuming unsigned bytes as the type of the element in a lane:
 
				 saturating_add :: intrinsics.simd_saturating_add
			
 
				 
			
 
				 /*
			
 
				-Saturated subtraction of 2 lanes of vectors.
			
 
				+Saturated subtraction of vectors.
			
 
				 
			
 
				 The *saturated difference* is a difference, that upon overflow or underflow,
			
 
				 instead of round-tripping, keeps the value clamped between the minimum and the
			
@@ -677,7 +677,7 @@ Assuming unsigned bytes as the type of the element in a lane:
 
				 saturating_sub :: intrinsics.simd_saturating_sub
			
 
				 
			
 
				 /*
			
 
				-Bitwise AND of the lanes of SIMD vectors.
			
 
				+Bitwise AND of vectors.
			
 
				 
			
 
				 This procedure returns a vector, such that each lane has the result of a bitwise
			
 
				 AND operation between the corresponding lanes of the vectors `a` and `b`.
			
@@ -712,7 +712,7 @@ Example:
 
				 bit_and     :: intrinsics.simd_bit_and
			
 
				 
			
 
				 /*
			
 
				-Bitwise OR of the 2 lanes of vectors.
			
 
				+Bitwise OR of vectors.
			
 
				 
			
 
				 This procedure returns a vector, such that each lane has the result of a bitwise
			
 
				 OR operation between the corresponding lanes of the vectors `a` and `b`.
			
@@ -747,7 +747,7 @@ Example:
 
				 bit_or      :: intrinsics.simd_bit_or
			
 
				 
			
 
				 /*
			
 
				-Bitwise XOR of the 2 lanes of vectors.
			
 
				+Bitwise XOR of vectors.
			
 
				 
			
 
				 This procedure returns a vector, such that each lane has the result of a bitwise
			
 
				 XOR operation between the corresponding lanes of the vectors `a` and `b`.
			
@@ -782,7 +782,7 @@ Example:
 
				 bit_xor     :: intrinsics.simd_bit_xor
			
 
				 
			
 
				 /*
			
 
				-Bitwise AND NOT of the 2 lanes of vectors.
			
 
				+Bitwise AND NOT of vectors.
			
 
				 
			
 
				 This procedure returns a vector, such that each lane has the result of a bitwise
			
 
				 AND NOT operation between the corresponding lanes of the vectors `a` and `b`.
			
@@ -823,7 +823,7 @@ This procedure returns a vector where each lane is the negation of the
 
				 corresponding lane in the vector `a`.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: An integer or a float vector to negate
			
 
				+- `a`: An integer or a float vector to negate.
			
 
				 
			
 
				 Returns:
			
 
				 - Negated vector.
			
@@ -883,9 +883,9 @@ Example:
 
				 abs   :: intrinsics.simd_abs
			
 
				 
			
 
				 /*
			
 
				-Minimum of each lane of SIMD vectors.
			
 
				+Minimum of each lane of vectors.
			
 
				 
			
 
				-This procedure returns a vector, such that each lane has is the minimum value
			
 
				+This procedure returns a vector, such that each lane has the minimum value
			
 
				 between the corresponding lanes in vectors `a` and `b`.
			
 
				 
			
 
				 Inputs:
			
@@ -922,9 +922,9 @@ Example:
 
				 min   :: intrinsics.simd_min
			
 
				 
			
 
				 /*
			
 
				-Maximum of each lane of SIMD vectors.
			
 
				+Maximum of each lane of vectors.
			
 
				 
			
 
				-This procedure returns a vector, such that each lane has is the maximum value
			
 
				+This procedure returns a vector, such that each lane has the maximum value
			
 
				 between the corresponding lanes in vectors `a` and `b`.
			
 
				 
			
 
				 Inputs:
			
@@ -961,7 +961,7 @@ Example:
 
				 max   :: intrinsics.simd_max
			
 
				 
			
 
				 /*
			
 
				-Clamp lanes of SIMD vector.
			
 
				+Clamp lanes of vector.
			
 
				 
			
 
				 This procedure returns a vector, where each lane is the result of the
			
 
				 clamping of the lane from the vector `v` between the values in the corresponding
			
@@ -1003,7 +1003,7 @@ Example:
 
				 clamp :: intrinsics.simd_clamp
			
 
				 
			
 
				 /*
			
 
				-Check if lanes of SIMD vectors are equal.
			
 
				+Check if lanes of vectors are equal.
			
 
				 
			
 
				 This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				 they are equal, and if they are, the corresponding lane of the result vector
			
@@ -1022,7 +1022,7 @@ containing comparison results for each lane.
 
				 
			
 
				 	for i in 0 ..< len(res) {
			
 
				 		if a[i] == b[i] {
			
 
				-			res[i] = unsigned(-1)
			
 
				+			res[i] = max(T)
			
 
				 		} else {
			
 
				 			res[i] = 0
			
 
				 		}
			
@@ -1045,7 +1045,7 @@ Example:
 
				 lanes_eq :: intrinsics.simd_lanes_eq
			
 
				 
			
 
				 /*
			
 
				-Check if lanes of SIMD vectors are not equal.
			
 
				+Check if lanes of vectors are not equal.
			
 
				 
			
 
				 This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				 they are not equal, and if they are, the corresponding lane of the result
			
@@ -1087,7 +1087,7 @@ Example:
 
				 lanes_ne :: intrinsics.simd_lanes_ne
			
 
				 
			
 
				 /*
			
 
				-Check if values of SIMD vector are less than the values of another SIMD vector.
			
 
				+Check if lanes of a vector are less than another.
			
 
				 
			
 
				 This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				 the lane of `a` is less than the lane of `b`, and if so, the corresponding lane
			
@@ -1129,10 +1129,10 @@ Example:
 
				 lanes_lt :: intrinsics.simd_lanes_lt
			
 
				 
			
 
				 /*
			
 
				-Check if values of SIMD vector are less than or equal the values of another
			
 
				+Check if lanes of a vector are less than or equal than another.
			
 
				 SIMD vector.
			
 
				 
			
 
				-This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				+This procedure checks each pair of lanes from vectors `a` and `b` for whether the
			
 
				 lane of `a` is less than or equal to the lane of `b`, and if so, the
			
 
				 corresponding lane of the result vector will have a value with all bits set
			
 
				 (`0xff..ff`). Otherwise the lane of the result vector will have the value `0`.
			
@@ -1172,10 +1172,10 @@ Example:
 
				 lanes_le :: intrinsics.simd_lanes_le
			
 
				 
			
 
				 /*
			
 
				-Check if values of SIMD vector are greater than the values of another SIMD
			
 
				+Check if lanes of a vector are greater than another.
			
 
				 vector.
			
 
				 
			
 
				-This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				+This procedure checks each pair of lanes from vectors `a` and `b` for whether the
			
 
				 lane of `a` is greater than to the lane of `b`, and if so, the corresponding
			
 
				 lane of the result vector will have a value with all bits set (`0xff..ff`).
			
 
				 Otherwise the lane of the result vector will have the value `0`.
			
@@ -1215,10 +1215,10 @@ Example:
 
				 lanes_gt :: intrinsics.simd_lanes_gt
			
 
				 
			
 
				 /*
			
 
				-Check if values of SIMD vector are greater than or equal the values of another
			
 
				+Check if lanes of a vector are greater than or equal than another.
			
 
				 SIMD vector.
			
 
				 
			
 
				-This procedure checks each pair of lanes from vectors `a` and `b` for whether
			
 
				+This procedure checks each pair of lanes from vectors `a` and `b` for whether the
			
 
				 lane of `a` is greater than or equal to the lane of `b`, and if so, the
			
 
				 corresponding lane of the result vector will have a value with all bits set
			
 
				 (`0xff..ff`). Otherwise the lane of the result vector will have the value `0`.
			
@@ -1258,13 +1258,14 @@ Example:
 
				 lanes_ge :: intrinsics.simd_lanes_ge
			
 
				 
			
 
				 /*
			
 
				-Perform a gather load into a SIMD vector.
			
 
				+Perform a gather load into a vector.
			
 
				 
			
 
				 A *gather* operation is memory load operation, that loads values from an vector
			
 
				 of addresses into a single value vector. This can be used to achieve the
			
 
				 following results:
			
 
				 
			
 
				-- Accessing every N'th element of an array (strided access)
			
 
				+
			
 
				+- Accessing every N'th element of an array (strided access).
			
 
				 - Access of elements according to some computed offsets (indexed access).
			
 
				 - Access of elements in a different order (shuffling access).
			
 
				 
			
@@ -1273,8 +1274,8 @@ for the `ptr` and `mask` parameters.
 
				 
			
 
				 Inputs:
			
 
				 - `ptr`: A vector of memory locations. Each pointer points to a single value,
			
 
				-	of a SIMD vector's lane type, that will be loaded into the vector. Pointer
			
 
				-	in this vector can be `nil` or any other invalid value, if the corresponding
			
 
				+	of a vector's lane type, that will be loaded into the vector. Pointers
			
 
				+	in this vector can be `nil` or any other invalid value if the corresponding
			
 
				 	value in the `mask` parameter is zero.
			
 
				 - `val`: A vector of values that will be used at corresponding positions
			
 
				 	of the result vector, if the corresponding memory location has been
			
@@ -1286,8 +1287,7 @@ Inputs:
 
				 	the value will be loaded from the `val` vector.
			
 
				 
			
 
				 Returns:
			
 
				-
			
 
				-This procedure returns a vector with all values from unmasked indices
			
 
				+- A vector with all values from unmasked indices
			
 
				 loaded from the pointer vector `ptr`, and all values from masked indices loaded
			
 
				 from the value vector `val`.
			
 
				 
			
@@ -1313,19 +1313,19 @@ value are the addresses of the values that we want to load into the result
 
				 vector, and we'll fill in `nil` for the rest of them. To prevent CPU from
			
 
				 dereferencing those `nil` addresses we provide the mask that only allows us
			
 
				 to load valid positions of the `ptrs` array, and the array of defaults which
			
 
				-will have `127` in each position as the default value.
			
 
				+will have `127` (`0x7f`) in each position as the default value.
			
 
				 
			
 
				-	v1 := [4] f32 {1, 2, 3, 4};
			
 
				-	v2 := [4] f32 {9, 10,11,12};
			
 
				-	ptrs := #simd [4]rawptr { &v1[1], nil, &v2[1], nil }
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	defaults := #simd [4]f32 { 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				+	v1 := [4]f32{1, 2, 3, 4}
			
 
				+	v2 := [4]f32{9, 10,11, 12}
			
 
				+	ptrs := #simd [4]rawptr{ &v1[1], nil, &v2[1], nil }
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	defaults := #simd [4]f32{ 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				 	res := simd.gather(ptrs, defaults, mask)
			
 
				 	fmt.println(res)
			
 
				 
			
 
				-The code would print `<2, 127, 10, 127>`. First and the third positions came
			
 
				-from the `ptrs` array, and the other 2 lanes of from the default vector.
			
 
				-Graphic below shows how the values of the result are decided based on the mask:
			
 
				+The code would print `<2, 127, 10, 127>`. The first and the third lane came
			
 
				+from the `ptrs` array, and the other 2 lanes are from the default vector.
			
 
				+The graphic below shows how the values of the result are decided based on the mask:
			
 
				 
			
 
				 	      +-------------------------------+ 
			
 
				 	mask: |   1   |   0   |   1   |   0   | 
			
@@ -1348,16 +1348,16 @@ Graphic below shows how the values of the result are decided based on the mask:
 
				 gather  :: intrinsics.simd_gather
			
 
				 
			
 
				 /*
			
 
				-Perform a scatter store from a SIMD vector.
			
 
				+Perform a scatter store from a vector.
			
 
				 
			
 
				 A *scatter* operation is a memory store operation that stores values from a
			
 
				 vector into multiple memory locations. This operation is effectively the
			
 
				-opposite from the *gather* operation.
			
 
				+opposite of the *gather* operation.
			
 
				 
			
 
				 Inputs:
			
 
				 - `ptr`: A vector of memory locations. Each masked location will be written
			
 
				-	to with a value from the `val` vector. Pointer in this vector can be `nil`
			
 
				-	or any other invalid value, if the corresponding value in the `mask`
			
 
				+	to with a value from the `val` vector. Pointers in this vector can be `nil`
			
 
				+	or any other invalid value if the corresponding value in the `mask`
			
 
				 	parameter is zero.
			
 
				 - `val`: A vector of values to write to the memory locations.
			
 
				 - `mask`: A vector of booleans or unsigned integers, that decides which lanes
			
@@ -1379,11 +1379,11 @@ Example below writes value `127` to the second element of two different
 
				 vectors. The addresses of store destinations are written to the first and the
			
 
				 third argument of the `ptr` vector, and the `mask` is set accordingly.
			
 
				 
			
 
				-	v1 := [4] f32 {1, 2, 3, 4};
			
 
				-	v2 := [4] f32 {5, 6, 7, 8};
			
 
				-	ptrs := #simd [4]rawptr { &v1[1], nil, &v2[1], nil }
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	vals := #simd [4]f32 { 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				+	v1 := [4]f32{1, 2, 3, 4}
			
 
				+	v2 := [4]f32{5, 6, 7, 8}
			
 
				+	ptrs := #simd [4]rawptr{ &v1[1], nil, &v2[1], nil }
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	vals := #simd [4]f32{ 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				 	simd.scatter(ptrs, vals, mask)
			
 
				 	fmt.println(v1)
			
 
				 	fmt.println(v2)
			
@@ -1413,10 +1413,10 @@ Graphic below shows how the data gets written into memory.
 
				 scatter :: intrinsics.simd_scatter
			
 
				 
			
 
				 /*
			
 
				-Perform a masked load into the SIMD vector.
			
 
				+Perform a masked load into the vector.
			
 
				 
			
 
				 This procedure performs a masked load from memory, into the vector. The `ptr`
			
 
				-argument specifies the base address from which the values of SIMD vector
			
 
				+argument specifies the base address from which the values of the vector
			
 
				 will be loaded. The mask selects the source for the result vector's lanes. If
			
 
				 the mask for the corresponding lane has the value `true` (lowest bit set), the
			
 
				 result lane is loaded from memory. Otherwise the result lane is loaded from the
			
@@ -1430,7 +1430,7 @@ Inputs:
 
				 - `mask`: The mask that selects where to load the values from.
			
 
				 
			
 
				 Returns:
			
 
				-- The loaded vector. The lanes for which the mask was set, are loaded from
			
 
				+- The loaded vector. The lanes for which the mask was set are loaded from
			
 
				 memory, and the other lanes are loaded from the `val` vector.
			
 
				 
			
 
				 **Operation**:
			
@@ -1450,9 +1450,9 @@ The following code loads two values from the `src` vector, the first and the
 
				 third value (selected by the mask). The masked-off values are given the value
			
 
				 of 127 (`0x7f`).
			
 
				 
			
 
				-	src := [4] f32 {1, 2, 3, 4};
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	vals := #simd [4]f32 { 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				+	src := [4]f32{1, 2, 3, 4}
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	vals := #simd [4]f32{ 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				 	res := simd.masked_load(&src, vals, mask)
			
 
				 	fmt.println(res)
			
 
				 
			
@@ -1485,7 +1485,7 @@ masked_load  :: intrinsics.simd_masked_load
 
				 /*
			
 
				 Perform a masked store to memory.
			
 
				 
			
 
				-This procedure performs a masked store, from a vector `val`, into memory at
			
 
				+This procedure performs a masked store from a vector `val`, into memory at
			
 
				 address `ptr`, with the `mask` deciding which lanes are going to be stored,
			
 
				 and which aren't. If the mask at a corresponding lane has the value `true`
			
 
				 (lowest bit set), the lane is stored into memory. Otherwise the lane is not
			
@@ -1498,7 +1498,7 @@ Inputs:
 
				 
			
 
				 **Operation**:
			
 
				 
			
 
				-	for i in len(val) {
			
 
				+	for i in 0 ..< len(val) {
			
 
				 		if mask[i]&1 == 1 {
			
 
				 			ptr[i] = val
			
 
				 		}
			
@@ -1509,9 +1509,9 @@ Example:
 
				 Example below stores the value 127 into the first and the third slot of the
			
 
				 vector `v`.
			
 
				 
			
 
				-	v := [4] f32 {1, 2, 3, 4};
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	vals := #simd [4]f32 { 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				+	v := [4]f32{1, 2, 3, 4}
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	vals := #simd [4]f32{ 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				 	simd.masked_store(&v, vals, mask)
			
 
				 	fmt.println(v)
			
 
				 
			
@@ -1538,16 +1538,16 @@ Graphic below shows the flow of lanes:
 
				 masked_store :: intrinsics.simd_masked_store
			
 
				 
			
 
				 /*
			
 
				-Load consecutive scalar values and expand into a SIMD vector.
			
 
				+Load consecutive scalar values and expand into a vector.
			
 
				 
			
 
				 This procedure loads a number of consecutive scalar values from an address,
			
 
				 specified by the `ptr` parameter, and stores them in a result vector, according
			
 
				 to the mask. The number of values read from memory is the number of set bits
			
 
				-in the mask. The lanes, for which the mask has the value `true` get the next
			
 
				-consecutive value from the memory, otherwise if the mask is `false` for the
			
 
				+in the mask. The lanes for which the mask has the value `true` get the next
			
 
				+consecutive value from memory, otherwise if the mask is `false` for the
			
 
				 lane, its value is filled from the corresponding lane of the `val` parameter.
			
 
				 
			
 
				-This procedure acts like `masked_store`, except the values from the memory are
			
 
				+This procedure acts like `masked_store`, except the values from memory are
			
 
				 read consecutively, and not according to the lanes. The memory values are read
			
 
				 and assigned to the result vector's masked lanes in order of increasing
			
 
				 addresses.
			
@@ -1583,9 +1583,9 @@ first lane of the result vector, and the second memory item will be read into
 
				 the third lane of the result vector. All the other lanes of the result vector
			
 
				 will be initialized to the default value `127`.
			
 
				 
			
 
				-	v := [2] f64 {1, 2};
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	vals := #simd [4]f64 { 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				+	v := [2]f64{1, 2}
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	vals := #simd [4]f64{ 0x7f, 0x7f, 0x7f, 0x7f }
			
 
				 	res := simd.masked_expand_load(&v, vals, mask)
			
 
				 	fmt.println(res)
			
 
				 
			
@@ -1640,13 +1640,13 @@ Inputs:
 
				 
			
 
				 Example:
			
 
				 
			
 
				-The code below fills the vector `v` with two values from a 4-element SIMD
			
 
				+The code below fills the vector `v` with two values from a 4-element
			
 
				 vector, the first and the third value. The items in the mask are set to `true`
			
 
				 in those lanes.
			
 
				 
			
 
				-	v := [2] f64 { };
			
 
				-	mask := #simd [4]bool { true, false, true, false }
			
 
				-	vals := #simd [4]f64 { 1, 2, 3, 4 }
			
 
				+	v: [2]f64
			
 
				+	mask := #simd [4]bool{ true, false, true, false }
			
 
				+	vals := #simd [4]f64{ 1, 2, 3, 4 }
			
 
				 	simd.masked_compress_store(&v, vals, mask)
			
 
				 	fmt.println(v)
			
 
				 
			
@@ -1673,14 +1673,14 @@ Graphical representation of the operation:
 
				 masked_compress_store :: intrinsics.simd_masked_compress_store
			
 
				 
			
 
				 /*
			
 
				-Extract scalar from a SIMD vector's lane.
			
 
				+Extract scalar from a vector's lane.
			
 
				 
			
 
				-This procedure returns the scalar, from the lane at the specified index of the
			
 
				+This procedure returns the scalar from the lane at the specified index of the
			
 
				 vector.
			
 
				 
			
 
				 Inputs:
			
 
				 - `a`: The vector to extract from.
			
 
				-- `idx`: Lane index.
			
 
				+- `idx`: The lane index.
			
 
				 
			
 
				 Returns:
			
 
				 - The value of the lane at the specified index.
			
@@ -1692,15 +1692,15 @@ Returns:
 
				 extract :: intrinsics.simd_extract
			
 
				 
			
 
				 /*
			
 
				-Replace the value in a SIMD vector's lane.
			
 
				+Replace the value in a vector's lane.
			
 
				 
			
 
				-This procedure places a scalar value at the lane at the specified index of
			
 
				+This procedure places a scalar value at the lane corresponding to the given index of
			
 
				 the vector.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: The vector to replace a lane.
			
 
				-- `idx`: Lane index.
			
 
				-- `elem`: Scalar to place.
			
 
				+- `a`: The vector to replace a lane in.
			
 
				+- `idx`: The lane index.
			
 
				+- `elem`: The scalar to place.
			
 
				 
			
 
				 Returns:
			
 
				 - Vector with the specified lane replaced.
			
@@ -1712,15 +1712,15 @@ Returns:
 
				 replace :: intrinsics.simd_replace
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by adding all the lanes.
			
 
				+Reduce a vector to a scalar by adding up all the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the ordered sum of all SIMD lanes. The
			
 
				+This procedure returns a scalar that is the ordered sum of all lanes. The
			
 
				 ordered sum may be important for accounting for precision errors in
			
 
				 floating-point computation, as floating-point addition is not associative,
			
 
				-that is `(a+b)+c` may not equal to `a+(b+c)`.
			
 
				+that is `(a+b)+c` may not be equal to `a+(b+c)`.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				 - Sum of all lanes, as a scalar.
			
@@ -1735,15 +1735,15 @@ Result:
 
				 reduce_add_ordered :: intrinsics.simd_reduce_add_ordered
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by multiplying all the lanes.
			
 
				+Reduce a vector to a scalar by multiplying all the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the ordered product of all SIMD lanes.
			
 
				+This procedure returns a scalar that is the ordered product of all lanes.
			
 
				 The ordered product may be important for accounting for precision errors in
			
 
				 floating-point computation, as floating-point multiplication is not associative,
			
 
				-that is `(a*b)*c` may not equal to `a*(b*c)`.
			
 
				+that is `(a*b)*c` may not be equal to `a*(b*c)`.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				 - Product of all lanes, as a scalar.
			
@@ -1758,13 +1758,13 @@ Result:
 
				 reduce_mul_ordered :: intrinsics.simd_reduce_mul_ordered
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by finding the minimum value between all of the lanes.
			
 
				+Reduce a vector to a scalar by finding the minimum value between all of the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the minimum value of all the lanes
			
 
				+This procedure returns a scalar that is the minimum value of all the lanes
			
 
				 in a vector.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				 - Minimum value of all lanes, as a scalar.
			
@@ -1779,16 +1779,16 @@ Result:
 
				 reduce_min :: intrinsics.simd_reduce_min
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by finding the maximum value between all of the lanes.
			
 
				+Reduce a vector to a scalar by finding the maximum value between all of the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the maximum value of all the lanes
			
 
				+This procedure returns a scalar that is the maximum value of all the lanes
			
 
				 in a vector.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				-- Minimum value of all lanes, as a scalar.
			
 
				+- Maximum value of all lanes, as a scalar.
			
 
				 
			
 
				 **Operation**:
			
 
				 
			
@@ -1800,13 +1800,13 @@ Result:
 
				 reduce_max :: intrinsics.simd_reduce_max
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by performing bitwise AND of all of the lanes.
			
 
				+Reduce a vector to a scalar by performing bitwise AND of all of the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the result of the bitwise AND operation
			
 
				+This procedure returns a scalar that is the result of the bitwise AND operation
			
 
				 between all of the lanes in a vector.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				 - Bitwise AND of all lanes, as a scalar.
			
@@ -1821,16 +1821,16 @@ Result:
 
				 reduce_and :: intrinsics.simd_reduce_and
			
 
				 
			
 
				 /*
			
 
				-Reduce SIMD vector to a scalar by performing bitwise OR of all of the lanes.
			
 
				+Reduce a vector to a scalar by performing bitwise OR of all of the lanes.
			
 
				 
			
 
				-This procedure returns a scalar, that is the result of the bitwise OR operation
			
 
				+This procedure returns a scalar that is the result of the bitwise OR operation
			
 
				 between all of the lanes in a vector.
			
 
				 
			
 
				 Inputs:
			
 
				-- `a`: Vector to reduce
			
 
				+- `a`: The vector to reduce.
			
 
				 
			
 
				 Result:
			
 
				-- Bitwise AND of all lanes, as a scalar.
			
 
				+- Bitwise OR of all lanes, as a scalar.
			
 
				 
			
 
				 **Operation**: