InvokeLeakTest Failure Analysis

Status: FIXED ✅

Fixed in commit a6d064a - Replaced DateTime.UtcNow with Stopwatch.GetTimestamp() in TimedEvents.cs

Fix Results

✅ InvokeLeakTest now passes on x64 under debugger
✅ All 3128 unit tests pass
✅ Added 5 new comprehensive tests for high-frequency scenarios
✅ Cross-platform consistent (x64 and ARM)

Original Issue Summary

The InvokeLeakTest stress test was failing only on x64 machines when running under a debugger:

Visual Studio 2022 on Windows (x64)
Visual Studio 2022 on macOS (Intel-based VM)
Visual Studio Code on Windows

The test passed in CI/CD environments and when run without a debugger.

Test Description

InvokeLeakTest is a stress test (not a unit test) located in Tests/StressTests/ApplicationStressTests.cs. It:

Spawns multiple concurrent tasks that call Application.Invoke() from background threads
Each invocation updates a TextField and increments a counter using Interlocked.Increment
The test verifies that all invocations complete successfully (no "leaks")
Runs for 50 passes with 500 increments each (25,000 total invocations)

Test Flow

// Main thread blocks in Application.Run()
Application.Run(top);

// Background thread spawns tasks
for (var j = 0; j < NUM_PASSES; j++) {
    for (var i = 0; i < NUM_INCREMENTS; i++) {
        Task.Run(() => {
            Thread.Sleep(r.Next(2, 4));  // Random 2-4ms delay
            Application.Invoke(() => {
                tf.Text = $"index{r.Next()}";
                Interlocked.Increment(ref _tbCounter);
            });
        });
    }
    // Wait for counter to reach expected value with 100ms polling
    while (_tbCounter != expectedValue) {
        _wakeUp.Wait(POLL_MS);  // POLL_MS = 100ms
        if (_tbCounter hasn't changed) {
            throw new TimeoutException("Invoke lost");
        }
    }
}

How Application.Invoke Works

Call Chain

Application.Invoke(action) → calls ApplicationImpl.Instance.Invoke(action)
ApplicationImpl.Invoke() checks if on main thread:
- If on main thread: Execute action immediately
- If on background thread: Add to _timedEvents with TimeSpan.Zero
TimedEvents.Add():
- Calculates timestamp: k = (DateTime.UtcNow + time).Ticks
- For TimeSpan.Zero, subtracts 100 ticks to ensure immediate execution: k -= 100
- Adds to sorted list: _timeouts.Add(NudgeToUniqueKey(k), timeout)
MainLoop.RunIteration() calls TimedEvents.RunTimers() every iteration
TimedEvents.RunTimers():
- Takes a copy of _timeouts and creates a new list (under lock)
- Iterates through copy, executing callbacks where k < now
- Non-repeating callbacks (return false) are not re-added

Critical Code Paths

ApplicationImpl.Invoke (Terminal.Gui/App/ApplicationImpl.cs:306-322)

public void Invoke (Action action)
{
    // If we are already on the main UI thread
    if (Application.MainThreadId == Thread.CurrentThread.ManagedThreadId)
    {
        action ();
        return;
    }

    _timedEvents.Add (TimeSpan.Zero,
                      () =>
                      {
                          action ();
                          return false;  // One-shot execution
                      }
                     );
}

TimedEvents.AddTimeout (Terminal.Gui/App/Timeout/TimedEvents.cs:124-139)

private void AddTimeout (TimeSpan time, Timeout timeout)
{
    lock (_timeoutsLockToken)
    {
        long k = (DateTime.UtcNow + time).Ticks;

        // if user wants to run as soon as possible set timer such that it expires right away
        if (time == TimeSpan.Zero)
        {
            k -= 100;  // Subtract 100 ticks to ensure it's "in the past"
        }

        _timeouts.Add (NudgeToUniqueKey (k), timeout);
        Added?.Invoke (this, new (timeout, k));
    }
}

TimedEvents.RunTimersImpl (Terminal.Gui/App/Timeout/TimedEvents.cs:160-192)

private void RunTimersImpl ()
{
    long now = DateTime.UtcNow.Ticks;
    SortedList<long, Timeout> copy;

    lock (_timeoutsLockToken)
    {
        copy = _timeouts;
        _timeouts = new ();
    }

    foreach ((long k, Timeout timeout) in copy)
    {
        if (k < now)  // Execute if scheduled time is in the past
        {
            if (timeout.Callback ())  // Returns false for Invoke actions
            {
                AddTimeout (timeout.Span, timeout);
            }
        }
        else  // Future timeouts - add back to list
        {
            lock (_timeoutsLockToken)
            {
                _timeouts.Add (NudgeToUniqueKey (k), timeout);
            }
        }
    }
}

Hypothesis: Why It Fails Under Debugger on @BDisp's Machine

Primary Hypothesis: DateTime.UtcNow Resolution and Debugger Timing

The test failure likely occurs due to a combination of factors:

1. DateTime.UtcNow Resolution Issues

The code uses DateTime.UtcNow.Ticks for timing, which has platform-dependent resolution:

Windows: ~15.6ms resolution (system timer tick)
Some systems: Can be lower/higher depending on timer configuration
Debugger impact: Can affect system timer behavior

When TimeSpan.Zero invocations are added:

long k = (DateTime.UtcNow + TimeSpan.Zero).Ticks;
k -= 100;  // Subtract 100 ticks (10 microseconds)

The problem: If two Invoke calls happen within the same timer tick (< ~15ms on Windows), they get the SAME DateTime.UtcNow value. The NudgeToUniqueKey function increments by 1 tick each collision, but this creates a sequence of timestamps like:

First call: now - 100
Second call (same UtcNow): now - 99
Third call (same UtcNow): now - 98
...and so on

2. Race Condition in RunTimersImpl

In RunTimersImpl, this check determines if a timeout should execute:

if (k < now)  // k is scheduled time, now is current time

The race: Between when timeouts are added (with k = UtcNow - 100) and when they're checked (with fresh DateTime.UtcNow), time passes. However, if:

Multiple invocations are added rapidly (within same timer tick)
The system is under debugger (slower iteration loop)
The main loop iteration happens to sample DateTime.UtcNow at an unlucky moment

Some timeouts might have k >= now even though they were intended to be "immediate" (TimeSpan.Zero).

3. Debugger-Specific Timing Effects

When running under a debugger:

a) Slower Main Loop Iterations

Debugger overhead slows each iteration
More time between RunTimers calls
Allows more tasks to queue up between iterations

b) Timer Resolution Changes

Debuggers can affect OS timer behavior
May change quantum/scheduling of threads
Different thread priorities under debugger

c) DateTime.UtcNow Sampling

More invocations can accumulate in a single UtcNow "tick"
Larger batches of timeouts with near-identical timestamps
Higher chance of k >= now race condition

4. The "Lost Invoke" Scenario

Failure scenario:

Time T0: Background thread calls Invoke()
         - k = UtcNow - 100 (let's say 1000 ticks - 100 = 900)
         - Added to _timeouts with k=900

Time T1: MainLoop iteration samples UtcNow = 850 ticks (!)
         - This can happen if system timer hasn't updated yet
         - Check: is k < now? Is 900 < 850? NO!
         - Timeout is NOT executed, added back to _timeouts

Time T2: Next iteration, UtcNow = 1100 ticks
         - Check: is k < now? Is 900 < 1100? YES!
         - Timeout executes

But if the test's 100ms polling window expires before T2, it throws TimeoutException.

5. Why x64 Machines Specifically?

UPDATE: @tig confirmed he can reproduce on his x64 Windows machine but NOT on his ARM Windows machine, validating this hypothesis.

Architecture-specific factors:

CPU/Chipset: Intel/AMD x64 vs ARM have fundamentally different timer implementations
- x64: Uses legacy TSC (Time Stamp Counter) or HPET (High Precision Event Timer)
- ARM: Uses different timer architecture with potentially better resolution
VM/Virtualization: MacOS VM on Intel laptop may have timer virtualization quirks
OS Configuration: Windows timer resolution settings (can be 1ms to 15.6ms)
Debugger Version: Specific VS2022 build with different debugging hooks
System Load: Background processes affecting timer accuracy
Hardware: Specific timer hardware behavior on x64 architecture

Secondary Hypothesis: Thread Scheduling Under Debugger

The test spawns tasks with Task.Run() and small random delays (2-4ms). Under a debugger:

Thread scheduling may be different
Task scheduling might be more synchronous
More tasks could complete within same timer resolution window
Creates "burst" of invocations that all get same timestamp

Why It Doesn't Fail on ARM

CONFIRMED: @tig cannot reproduce on ARM Windows machine, only on x64 Windows.

ARM environments:

Run without debugger (no debugging overhead) in CI/CD
Different timer characteristics - ARM timer architecture has better resolution
Faster iterations (less time for race conditions)
ARM CPU architecture uses different timer implementation than x64
ARM timer subsystem may have higher base resolution or better behavior under load

Evidence Supporting the Hypothesis

Test uses 100ms polling: _wakeUp.Wait(POLL_MS) where POLL_MS = 100
- This gives a narrow window for all invocations to complete
- Any delay beyond 100ms triggers failure
Test spawns 500 concurrent tasks per pass: Each with 2-4ms delay
- Under debugger, these could all queue up in < 100ms
- But execution might take > 100ms due to debugger overhead
Only fails under debugger: Strong indicator of timing-related issue
- Debugger affects iteration speed and timer behavior
Architecture-specific (CONFIRMED): @tig reproduced on x64 Windows but NOT on ARM Windows
- This strongly supports the timer resolution hypothesis
- x64 timer implementation is more susceptible to this race condition
- ARM timer architecture handles the scenario more gracefully

Additional Investigation Needed

To confirm hypothesis, @BDisp could:

Add diagnostics to test:

var sw = Stopwatch.StartNew();
while (_tbCounter != expectedValue) {
_wakeUp.Wait(pollMs);
if (_tbCounter != tbNow) continue;
    
// Log timing information
Console.WriteLine($"Timeout at {sw.ElapsedMilliseconds}ms");
Console.WriteLine($"Counter: {_tbCounter}, Expected: {expectedValue}");
Console.WriteLine($"Missing: {expectedValue - _tbCounter}");
    
// Check if invokes are still queued
Console.WriteLine($"TimedEvents count: {Application.TimedEvents?.Timeouts.Count}");
}

Test timer resolution:

var samples = new List<long>();
for (int i = 0; i < 100; i++) {
samples.Add(DateTime.UtcNow.Ticks);
}
var deltas = samples.Zip(samples.Skip(1), (a, b) => b - a).Where(d => d > 0);
Console.WriteLine($"Min delta: {deltas.Min()} ticks ({deltas.Min() / 10000.0}ms)");

Monitor TimedEvents queue:
Add logging in TimedEvents.RunTimersImpl to see when timeouts are deferred
Check if k >= now condition is being hit

Conclusion

The InvokeLeakTest failure under debugger is likely caused by:

Low resolution of DateTime.UtcNow combined with rapid invocations
Race condition in timeout execution check (k < now)
Debugger overhead exacerbating timing issues
Platform-specific timer behavior on @BDisp's hardware/VM

The most robust fix is to use Stopwatch for timing instead of DateTime.UtcNow, providing:

Higher resolution timing
Better consistency across platforms
Reduced susceptibility to debugger effects

This is a timing/performance issue in the stress test environment, not a functional bug in the production code. The test is correctly identifying edge cases in high-concurrency scenarios that are more likely to manifest under debugger overhead.

InvokeLeakTest_Analysis.md 13 KB

Permalink History Raw

InvokeLeakTest Failure Analysis

Status: FIXED ✅

Fix Results

Original Issue Summary

Test Description

Test Flow

How Application.Invoke Works

Call Chain

Critical Code Paths

ApplicationImpl.Invoke (Terminal.Gui/App/ApplicationImpl.cs:306-322)

TimedEvents.AddTimeout (Terminal.Gui/App/Timeout/TimedEvents.cs:124-139)

TimedEvents.RunTimersImpl (Terminal.Gui/App/Timeout/TimedEvents.cs:160-192)

Hypothesis: Why It Fails Under Debugger on @BDisp's Machine

Primary Hypothesis: DateTime.UtcNow Resolution and Debugger Timing

1. DateTime.UtcNow Resolution Issues

2. Race Condition in RunTimersImpl

3. Debugger-Specific Timing Effects

4. The "Lost Invoke" Scenario

5. Why x64 Machines Specifically?

Secondary Hypothesis: Thread Scheduling Under Debugger

Why It Doesn't Fail on ARM

Evidence Supporting the Hypothesis

Recommended Solutions

Solution 1: Use Stopwatch Instead of DateTime.UtcNow (Recommended)

Solution 2: Increase TimeSpan.Zero Buffer

Solution 3: Add Wakeup Call on Invoke

Solution 4: Test-Specific Changes

Solution 5: Use Interlocked Operations More Defensively

Additional Investigation Needed

Conclusion

InvokeLeakTest_Analysis.md 13 KB Permalink History Raw

InvokeLeakTest Failure Analysis

Status: FIXED ✅

Fix Results

Original Issue Summary

Test Description

Test Flow

How Application.Invoke Works

Call Chain

Critical Code Paths

ApplicationImpl.Invoke (Terminal.Gui/App/ApplicationImpl.cs:306-322)

TimedEvents.AddTimeout (Terminal.Gui/App/Timeout/TimedEvents.cs:124-139)

TimedEvents.RunTimersImpl (Terminal.Gui/App/Timeout/TimedEvents.cs:160-192)

Hypothesis: Why It Fails Under Debugger on @BDisp's Machine

Primary Hypothesis: DateTime.UtcNow Resolution and Debugger Timing

1. DateTime.UtcNow Resolution Issues

2. Race Condition in RunTimersImpl

3. Debugger-Specific Timing Effects

4. The "Lost Invoke" Scenario

5. Why x64 Machines Specifically?

Secondary Hypothesis: Thread Scheduling Under Debugger

Why It Doesn't Fail on ARM

Evidence Supporting the Hypothesis

Recommended Solutions

Solution 1: Use Stopwatch Instead of DateTime.UtcNow (Recommended)

Solution 2: Increase TimeSpan.Zero Buffer

Solution 3: Add Wakeup Call on Invoke

Solution 4: Test-Specific Changes

Solution 5: Use Interlocked Operations More Defensively

Additional Investigation Needed

Conclusion

InvokeLeakTest_Analysis.md 13 KB

Permalink History Raw