2
0

InvokeLeakTest_Timing_Diagram.md 9.6 KB

InvokeLeakTest Timing Diagram

Normal Operation (Without Debugger)

Timeline (milliseconds):
0ms    10ms   20ms   30ms   40ms   50ms   60ms   70ms   80ms   90ms   100ms
|------|------|------|------|------|------|------|------|------|------|
│
│ Background Thread 1: Task.Run → Sleep(2-4ms) → Invoke()
│                                                     ↓
│ Background Thread 2: Task.Run → Sleep(2-4ms) → Invoke()
│                                                     ↓
│ Background Thread 3: Task.Run → Sleep(2-4ms) → Invoke()
│                                                     ↓
│                                                 [All added to _timeouts]
│
│ Main Loop: ──────────[Iter]───────[Iter]───────[Iter]───────[Iter]────
│                         ↓           ↓           ↓           ↓
│                    RunTimers   RunTimers   RunTimers   RunTimers
│                         ↓           ↓           ↓           ↓
│                    Execute 0   Execute 5   Execute 10  Execute 15
│
│ Counter:     0 ───────→ 0 ─────→ 5 ───────→ 15 ────→ 30 ────→ 45 ─────→ 50
│
│ Test Check:                                                           ✓ PASS
│                         └──────────────100ms window────────────────┘

Result: All invocations execute within 100ms → Test passes


Problem Scenario (With Debugger - @BDisp's Machine)

Timeline (milliseconds):
0ms    10ms   20ms   30ms   40ms   50ms   60ms   70ms   80ms   90ms   100ms  110ms
|------|------|------|------|------|------|------|------|------|------|------|
│
│ Background Threads: 500 Tasks launch rapidly
│                     ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ (burst of invocations)
│                     All added to _timeouts within same DateTime.UtcNow tick
│                     Timestamps: T-100, T-99, T-98, T-97, ... (NudgeToUniqueKey)
│
│ Main Loop (SLOW due to debugger overhead):
│           ────────────────[Iter 1]────────────────────[Iter 2]──────────────
│                             25ms                         60ms
│                              ↓                            ↓
│                         RunTimers                    RunTimers
│                              ↓                            ↓
│                      DateTime.UtcNow                DateTime.UtcNow
│                            = T0                          = T1
│                              ↓                            ↓
│                      Check: k < now?               Check: k < now?
│                              ↓                            ↓
│           ┌─────────────────┴──────────────┐             │
│           │ Some timeouts: k >= now !      │             │ Execute some
│           │ These are NOT executed         │             │ timeouts
│           │ Added back to _timeouts        │             │
│           └────────────────────────────────┘             ↓
│                                                      Counter += 300
│ Counter:  0 ────────────────→ 0 ──────────────────────→ 300 ────────────→ 450
│
│ Test Check at 100ms:                                ✗ FAIL
│                     └──────────100ms window──────┘
│                     Counter = 300, Expected = 500
│                     Missing 200 invocations!
│
│ (Those 200 invocations execute later, around 110ms)

Result: Not all invocations execute within 100ms → TimeoutException


The DateTime.UtcNow Resolution Problem

Real Time:        0.000ms  0.001ms  0.002ms  0.003ms  ...  15.6ms  15.7ms
                  │        │        │        │               │       │
DateTime.UtcNow:  T0───────────────────────────────────────→T1─────→T2
                  └─────────────15.6ms tick──────────────────┘
                           All invocations here get T0

When 100 invocations happen within 15.6ms:

Invoke #1:  k = T0 - 100 ticks
Invoke #2:  k = T0 - 99 ticks   (NudgeToUniqueKey increments)
Invoke #3:  k = T0 - 98 ticks
...
Invoke #100: k = T0 + 0 ticks   (This is T0!)

When RunTimers() checks at time T0 + 50 ticks:

Invoke #1-50:  k < now  → Execute ✓
Invoke #51-100: k >= now → NOT executed! Added back to queue ✗

Why Debugger Makes It Worse

Without Debugger

Main Loop Iteration Time: ~10-20ms
│ Invoke batch: 10 tasks ────→ Execute 10 ────→ Next batch: 10 tasks
│                          10ms              10ms
│ Small batches processed quickly

With Debugger

Main Loop Iteration Time: ~25-50ms (2-5x slower!)
│ Invoke batch: 100 tasks (burst!) ────────→ Execute 50 ──→ 50 still queued
│                                      50ms                    ↓
│                                                      Need another 50ms!
│ Large batches accumulate, processing delayed

Effect: More invocations queue up between iterations, increasing likelihood of timestamp collisions and race conditions.


The Race Condition Explained

Thread Timeline:

Background Thread                      Main Thread
─────────────────                      ───────────

[Call Invoke()]
      ↓
[Lock _timeoutsLockToken]
      ↓
k = DateTime.UtcNow.Ticks              
  = 1000                               
      ↓
k -= 100 (= 900)                       
      ↓                                 
[Add to _timeouts with k=900]          
      ↓
[Release lock]                         [Lock _timeoutsLockToken]
                                            ↓
                                       [Copy _timeouts]
                                            ↓
                                       [Release lock]
                                            ↓
                                       now = DateTime.UtcNow.Ticks
                                           = 850  ⚠️ (Timer hasn't updated!)
                                            ↓
                                       Check: k < now?
                                       900 < 850? → FALSE!
                                            ↓
                                       [Timeout NOT executed]
                                       [Added back to _timeouts]

Problem: Between when k is calculated (900) and when it's checked (now=850), the system timer hasn't updated! This can happen because:

  1. DateTime.UtcNow has coarse resolution (~15ms)
  2. Thread scheduling can cause the check to happen "early"
  3. Debugger makes timing less predictable

Solution Comparison

Current: DateTime.UtcNow

Resolution: ~15.6ms (Windows), varies by platform
Precision:  Low
Stability:  Affected by system time changes
Debugger:   Timing issues

Time:     0ms  ────────────→ 15.6ms ────────────→ 31.2ms
Reading:  T0                 T1                     T2
          └─────All values here are T0─────┘

Proposed: Stopwatch.GetTimestamp()

Resolution: ~1 microsecond (typical)
Precision:  High
Stability:  Not affected by system time changes
Debugger:   More reliable

Time:     0ms  →  0.001ms → 0.002ms → 0.003ms → ...
Reading:  T0      T1        T2        T3        ...
          Each reading is unique and monotonic

Benefit: With microsecond resolution, even 1000 rapid invocations get unique timestamps, eliminating the NudgeToUniqueKey workaround and race conditions.


Test Scenarios

Scenario 1: Fast Machine, No Debugger

Iteration time: 5-10ms
Invoke rate: 20-30/ms
Result: ✓ PASS (plenty of time margin)

Scenario 2: Normal Machine, No Debugger

Iteration time: 10-20ms
Invoke rate: 10-20/ms
Result: ✓ PASS (adequate time margin)

Scenario 3: ARM Machine, Debugger (@tig's ARM Windows)

Iteration time: 20-30ms
Invoke rate: 15-20/ms
ARM timer resolution: Better than x64
Result: ✓ PASS (ARM timer architecture handles it)

Scenario 4: x64 Machine, Debugger (@BDisp's x64, @tig's x64 Windows) - CONFIRMED

Iteration time: 30-50ms
Invoke rate: 10-15/ms
DateTime.UtcNow resolution: 15-20ms (x64 TSC/HPET timer)
Result: ✗ FAIL (exceeds 100ms window)

CONFIRMED: @tig reproduced on x64 but NOT on ARM

Recommendations

Immediate Fix (Test-Level)

// Increase tolerance for debugger scenarios
#if DEBUG
private const int POLL_MS = Debugger.IsAttached ? 500 : 100;
#else
private const int POLL_MS = 100;
#endif

Long-Term Fix (Production Code)

// In TimedEvents.cs, replace DateTime.UtcNow with Stopwatch
private static long GetTimestampTicks() 
{
    return Stopwatch.GetTimestamp() * (TimeSpan.TicksPerSecond / Stopwatch.Frequency);
}

// Use in AddTimeout:
long k = GetTimestampTicks() + time.Ticks;

This provides microsecond resolution and eliminates the race condition entirely.