瀏覽代碼

Ragdoll and ConvexMesh test are now producing the same results on ARM as on x86 (#197)

* Fixed ARM vs x86 difference in matrix inversion
* Switched determinism check to use LinearCast as that exercises more code
* Updated documentation
Jorrit Rouwe 3 年之前
父節點
當前提交
a90a740b3e
共有 3 個文件被更改,包括 20 次插入13 次删除
  1. 8 8
      .github/workflows/determinism_check.yml
  2. 8 1
      Docs/Architecture.md
  3. 4 4
      Jolt/Math/Mat44.inl

+ 8 - 8
.github/workflows/determinism_check.yml

@@ -1,8 +1,8 @@
 name: Determinism Check
 
 env:
-    CONVEX_VS_MESH_HASH: '0x4245064f2c076b63'
-    RAGDOLL_HASH: '0xffa62c540d58c892'
+    CONVEX_VS_MESH_HASH: '0x485e1d8e739a3c9d'
+    RAGDOLL_HASH: '0xc29b4c0ea4cf1876'
 
 on:
   push:
@@ -32,10 +32,10 @@ jobs:
       run: cmake --build ${{github.workspace}}/Build/Linux_Distribution --config Distribution
     - name: Test ConvexVsMesh
       working-directory: ${{github.workspace}}/Build/Linux_Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=ConvexVsMesh -validate_hash=${CONVEX_VS_MESH_HASH}
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=ConvexVsMesh -validate_hash=${CONVEX_VS_MESH_HASH}
     - name: Test Ragdoll
       working-directory: ${{github.workspace}}/Build/Linux_Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=Ragdoll -validate_hash=${RAGDOLL_HASH}
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=Ragdoll -validate_hash=${RAGDOLL_HASH}
 
   msvc_cl:
     runs-on: windows-latest
@@ -52,10 +52,10 @@ jobs:
       run: msbuild Build\VS2022_CL\JoltPhysics.sln /property:Configuration=Distribution
     - name: Test ConvexVsMesh
       working-directory: ${{github.workspace}}/Build/VS2022_CL/Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=ConvexVsMesh "-validate_hash=$env:CONVEX_VS_MESH_HASH"
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=ConvexVsMesh "-validate_hash=$env:CONVEX_VS_MESH_HASH"
     - name: Test Ragdoll
       working-directory: ${{github.workspace}}/Build/VS2022_CL/Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=Ragdoll "-validate_hash=$env:RAGDOLL_HASH"
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=Ragdoll "-validate_hash=$env:RAGDOLL_HASH"
 
   macos:
     runs-on: macos-latest
@@ -70,7 +70,7 @@ jobs:
       run: cmake --build ${{github.workspace}}/Build/Linux_Distribution --config Distribution
     - name: Test ConvexVsMesh
       working-directory: ${{github.workspace}}/Build/Linux_Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=ConvexVsMesh -validate_hash=${CONVEX_VS_MESH_HASH}
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=ConvexVsMesh -validate_hash=${CONVEX_VS_MESH_HASH}
     - name: Test Ragdoll
       working-directory: ${{github.workspace}}/Build/Linux_Distribution
-      run: ./PerformanceTest -q=Discrete -t=2 -s=Ragdoll -validate_hash=${RAGDOLL_HASH}
+      run: ./PerformanceTest -q=LinearCast -t=2 -s=Ragdoll -validate_hash=${RAGDOLL_HASH}

+ 8 - 1
Docs/Architecture.md

@@ -311,7 +311,14 @@ The physics simulation is deterministic provided that:
 * The APIs that modify the simulation are called in exactly the same order. For example, bodies and constraints need to be added/removed/modified in exactly the same order so that the state at the beginning of a simulation step is exactly the same for both simulations.
 * The same binary code is used to run the simulation. For example, when you run the simulation on Windows it doesn't matter if you have an AMD or Intel processor. 
 
-If you want cross platform determinism (e.g. Linux vs Windows) then please turn on the CROSS_PLATFORM_DETERMINISTIC option in CMake. This will compile the library without fused multiply add instructions and with precise math (so it will come at a performance cost). It has been tested with the PerformanceTest (both with Ragdoll and ConvexVsMesh test) to result in the same simulation regardless if the library was compiled on MSVC2022 or clang, in Debug and Release mode and on Windows or Linux. Note that the library is not yet cross platform deterministic between ARM vs x86. Also note that it is quite difficult to verify cross platform determinism, so this feature is less tested than other features.
+If you want cross platform determinism then please turn on the CROSS_PLATFORM_DETERMINISTIC option in CMake. This will make the library approximately 8% slower but the simulation will be deterministic regardless of:
+
+* Compiler used to compile the library (tested MSVC2022 vs clang)
+* Configuration (Debug, Release or Distribution)
+* OS (tested Windows vs Linux)
+* Architecture (x86 or ARM)
+
+Note that the same source code must be used to compile the library on all platforms. Also note that it is quite difficult to verify cross platform determinism, so this feature is less tested than other features.
 
 When running the Samples Application you can press ESC, Physics Settings and check the 'Check Determinism' checkbox. Before every simulation step we will record the state using the [StateRecorder](@ref StateRecorder) interface, rewind the simulation and do the step again to validate that the simulation runs deterministically. Some of the tests (e.g. the MultiThreaded) test will explicitly disable the check because they randomly add/remove bodies from different threads. This violates the first rule so will not result in a deterministic simulation.
 

+ 4 - 4
Jolt/Math/Mat44.inl

@@ -557,8 +557,8 @@ Mat44 Mat44::Inversed() const
 	minor3 = _mm_add_ps(_mm_mul_ps(row1, tmp1), minor3);
 
 	__m128 det = _mm_mul_ps(row0, minor0);
-	det = _mm_add_ps(_mm_shuffle_ps(det, det, _MM_SHUFFLE(1, 0, 3, 2)), det);
-	det = _mm_add_ss(_mm_shuffle_ps(det, det, _MM_SHUFFLE(2, 3, 0, 1)), det);
+	det = _mm_add_ps(_mm_shuffle_ps(det, det, _MM_SHUFFLE(2, 3, 0, 1)), det); // Original code did (x + z) + (y + w), changed to (x + y) + (z + w) to match the ARM code below and make the result cross platform deterministic
+	det = _mm_add_ss(_mm_shuffle_ps(det, det, _MM_SHUFFLE(1, 0, 3, 2)), det);
 	det = _mm_div_ss(_mm_set_ss(1.0f), det);
 	det = _mm_shuffle_ps(det, det, _MM_SHUFFLE(0, 0, 0, 0));
 	
@@ -860,8 +860,8 @@ Mat44 Mat44::Inversed3x3() const
 	minor1 = _mm_sub_ps(minor1, _mm_mul_ps(row3, tmp1));
 
 	__m128 det = _mm_mul_ps(row0, minor0);
-	det = _mm_add_ps(_mm_shuffle_ps(det, det, _MM_SHUFFLE(1, 0, 3, 2)), det);
-	det = _mm_add_ss(_mm_shuffle_ps(det, det, _MM_SHUFFLE(2, 3, 0, 1)), det);
+	det = _mm_add_ps(_mm_shuffle_ps(det, det, _MM_SHUFFLE(2, 3, 0, 1)), det); // Original code did (x + z) + (y + w), changed to (x + y) + (z + w) to match the ARM code below and make the result cross platform deterministic
+	det = _mm_add_ss(_mm_shuffle_ps(det, det, _MM_SHUFFLE(1, 0, 3, 2)), det);
 	det = _mm_div_ss(_mm_set_ss(1.0f), det);
 	det = _mm_shuffle_ps(det, det, _MM_SHUFFLE(0, 0, 0, 0));