Skip to content

Conversation

@Fix-Point
Copy link
Contributor

@Fix-Point Fix-Point commented Dec 22, 2025

Summary

This PR is part III of the #17556 and based on #17584, it primarily introduces fix-up and improvements to the WDOG module, including the following key changes:

  • Removed the wd_recover interface to simplify the WDOG module, reduce coupling, and decrease code size.
  • Added the list_delete_fast interface to improve WDOG deletion performance. Theoretically, this saves at least 1 write operation on the path where a WDOG needs to be deleted, and 2 write operations when a WDOG needs to be reinserted.
  • Reverted the WDOG spin-lock to the big kernel lock (enter_critical_section) to fix concurrency-related bugs.
  • Simplified wd_timer to reduce performance overhead and Worst-Case Execution Time (WCET) for timeout handling.
  • Modified code organization to comply with the MISRA-C coding standard.

Special attention is required regarding why we reverted the fine-grained WDOG spin-lock to the big kernel lock (enter_critical_section).

In the earliest implementation, the WDOG had only two states:

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#FFFFFF',
      'primaryTextColor' : '#000000',
      'mermiad-container': "#FFFFFF",
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#FFFFFF',
      'tertiaryColor': '#000000'
    },
    'sequence': { 'mirrorActors': false }
  }
}%%

stateDiagram-v2
    WDOG_INACTIVE --> WDOG_INACTIVE : wd_cancel
    WDOG_INACTIVE --> WDOG_ACTIVE : wd_start
    WDOG_ACTIVE --> WDOG_ACTIVE : wd_start
    WDOG_ACTIVE --> WDOG_INACTIVE : wd_cancel or callback finished
Loading

Figure 1. Correct WDOG state diagram.

This implementation was functionally correct in an SMP environment. However, after switching to a spin-lock and relaxing the critical section, the WDOG entered three states:

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#FFFFFF',
      'primaryTextColor' : '#000000',
      'mermiad-container': "#FFFFFF",
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#FFFFFF',
      'tertiaryColor': '#000000'
    },
    'sequence': { 'mirrorActors': false }
  }
}%%

stateDiagram-v2
    WDOG_INACTIVE|private --> WDOG_ACTIVE|shared : wd_start
    WDOG_ACTIVE|shared --> WDOG_INACTIVE|shared :  execute callback
    WDOG_INACTIVE|shared --> WDOG_ACTIVE|shared : wd_start in callback to restart
    WDOG_INACTIVE|shared --> WDOG_INACTIVE|private : callback finished
    WDOG_ACTIVE|shared --> WDOG_INACTIVE|private : wd_cancel
    WDOG_INACTIVE|shared --> WDOG_INACTIVE|shared : wd_cancel failed
Loading

Figure 2. Incorrect WDOG state diagram in current implementation.

After modifying the critical section, the execution of the WDOG callback function is no longer atomic. This inevitably introduces a new state: WDOG_INACTIVE | shared. In this state, even though wdog->func == NULL, its ownership does not belong to the user thread, as shown in Figure 2. If the user repurposes the WDOG memory space at this moment, a race condition occurs.

Race Condition Example:

  1. At time $$t_0$$: The interrupt handler thread executes the WDOG callback and is about to restart the WDOG within the callback.
  2. At time $$t_1$$: The user thread calls wd_cancel, reads wdog->func == NULL, mistakenly assumes the WDOG is in the WDOG_INACTIVE | private state, and returns directly. The user thread then calls memset to set the WDOG memory space to 0x55aaaa55.
  3. At time $$t_2$$: The interrupt handler thread calls wd_start, reads a non-NULL wdog->func (0x55aaaa55), and attempts to remove the WDOG.
  4. At time $$t_3$$: During the removal process, the interrupt handler thread reads the pointer 0x55aaaa55, causing a memory access fault.

To address this issue, a straightforward idea is to call wd_cancel before using the WDOG to ensure it enters the WDOG_INACTIVE | private state. However, even this simple approach is not easily achievable. Closer inspection of the state diagram reveals:

  • Only by calling wd_cancel in the WDOG_ACTIVE | shared state can the WDOG return to the user-available WDOG_INACTIVE | private state.
  • In the WDOG_INACTIVE | shared state, wd_cancel fails. If the user thread mistakenly believes the WDOG is in the WDOG_INACTIVE | private state and requeues the WDOG, a race condition still occurs, leading to runtime errors.

In this scenario, to ensure WDOG concurrency correctness, it becomes necessary to loop wd_cancel until it succeeds. However, this looping approach introduces the following problems:

  • Starvation Risk: Suppose the WDOG continuously cycles through WDOG_INACTIVE | shared $$\rightarrow^{wd_start}$$ WDOG_ACTIVE | shared $$\rightarrow^{\text{execute callback}}$$ WDOG_INACTIVE | shared as a periodic timer. If wd_cancel is called precisely when the WDOG is in the WDOG_INACTIVE | shared state (executing the callback), wd_cancel might fail indefinitely. In other words, wd_cancel can starve, violating the system's real-time and deterministic properties.
  • Requires Waiting for Callback Completion: Even if the WDOG is not a periodic timer (i.e., no WDOG_INACTIVE | shared $$\rightarrow^{wd_start}$$ WDOG_ACTIVE | shared transition), using the WDOG still requires waiting for the callback to finish. However, the current implementation lacks such a waiting mechanism.

Explanation of other related points:
Why not improve the state machine based on the new fine-grained spin-lock?
After introducing the third state WDOG_INACTIVE | shared, we could enhance the state machine to achieve the following:

  • On single-core platforms, it would be possible to implement semantics identical to the original wd_cancel.
  • On SMP platforms, hazard pointers could be used to avoid the issue where periodic WDOGs cannot be canceled. However, this would require waiting for remote threads to complete, meaning we cannot achieve semantics fully identical to the original wd_cancel. Consequently, we would have to reimplement all code that uses WDOGs.

For example, under the asynchronous wd_cancel semantics the following problematic thread interleaving could still occur:

   The previous `nxsem_timeout` callback function caused the second semaphore wait to fail.
    
        Core 0 [nxsem_clockwait]         Core 1
        enter_critical_section()       |  ...
        wd_start(nxsem_timeout)        |  ...
        nxsem_wait(sem)                |  wd_expiration() --> nxsem_timeout
        waked up by others             |
        wd_cancel(&rtcb->waitdog)      |  try enter_critical_section()
        leave_critical_section()       |  Failed retry...(Blocking by Core n)
        ... call nxsem_clockwait again |  Failed retry...
        enter_critical_section()       |  Failed retry...
        wd_start(nxsem_timeout)        |  Failed retry...
        nxsem_wait(sem)                |  Core 1 enter the critical section
                                       |  nxsem_wait_irq(wtcb, ETIMEDOUT) -> incorrectly wake-up the rtcb.
  • According to statistics, WDOG usage appears in 219/189 files across 400/427 locations where wd_start/wd_cancel are called. Reviewing and rewriting all related logic would clearly be impractical.

Why not use a recursive spin-lock (rspin_lock) but instead revert to a big kernel lock?
Because it could lead to circular waiting and deadlock:

  • Attempting to acquire the big kernel lock while holding the WDOG lock: Calling system functions within the WDOG timeout callback will attempt to acquire the big kernel lock.
  • Attempting to acquire the WDOG lock while holding the big kernel lock: This occurs when tcb->waitdog is restarted within a critical section.

Why is the most time-consuming operation, nxsched_reassess_timer(), also placed within the critical section?

  • If we do not place it within the critical section, after inserting a timer that needs to trigger immediately, the thread might be preempted (e.g., by a network interrupt). This could delay the timer setup, potentially causing uncertain timer delays and causing tasks to miss their deadlines.

Impact

This change will affect the WDOG module's code size, performance, critical section entry/exit time, and the WCET (Worst-Case Execution Time) of timeout handling.

  • The removal of wd_recover should result in an overall reduction in code size.
  • Adding list_delete_fast and replacing the original WDOG deletion interface reduces code size and slightly improves performance.
  • While simplifying wd_timer optimizes the performance and WCET of timer timeout handling, reverting the spin-lock to a big-kernel-lock significantly increases the execution time within the critical section. However, this sacrifice is necessary for functional correctness. Without functional correctness, high performance is meaningless. Once the HRTimer is integrated, we will gradually replace WDOG calls with hrtimer and apply optimizations to minimize the size of WDOG's critical sections as much as possible.

Testing

We ran ostest on rv-virt:smp to verify functional correctness, with CONFIG_SCHED_TICKLESS=y and CONFIG_RR_INTERVAL=0 enabled.
We also launched 4 instances and executed ostest 64 times in total. All test results passed.

@github-actions github-actions bot added Area: Documentation Improvements or additions to documentation Arch: arm Issues related to ARM (32-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Area: Drivers Drivers issues Area: OS Components OS Components issues Size: L The size of the change in this PR is large labels Dec 22, 2025
acassis
acassis previously approved these changes Dec 22, 2025
Copy link
Contributor

@acassis acassis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fix-Point really good description and explanation about the issue! Kudos!!

jerpelea
jerpelea previously approved these changes Dec 22, 2025
This commit Removed wd_recover.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit reduced 1 write operation for wdog deleting and decoupled
the WDOG_ISACTIVE with the list implementation.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit removed the workaround for wdog latency.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
…ection).

If the wdog use the fine-grained spin-lock, and allow the callback execution without the lock held, there will be incorrect interleaving.
E.g. The first `nxsem_timeout` callback function caused the second semaphore wait to fail.

    Core 0 [nxsem_clockwait]       |  Core 1
    enter_critical_section()       |  ...
    wd_start(nxsem_timeout)        |  ...
    nxsem_wait(sem)                |  wd_expiration() --> nxsem_timeout
    wd_cancel(&rtcb->waitdog)      |  try enter_critical_section()
    leave_critical_section()       |  Failed retry...
    ....nxsem_clockwait            |  Failed retry...
    enter_critical_section()       |   Failed retry...
    wd_start(nxsem_timeout)        |  Failed retry...
    nxsem_wait(sem)                |  Core 1 enter the critical section
                                   |  nxsem_wait_irq(wtcb, ETIMEDOUT) -> incorrectly wake-up the rtcb.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
This commit simplified wd_timer() to reduce WCET when `noswitches` is false.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
@github-actions github-actions bot added Size: M The size of the change in this PR is medium and removed Area: Documentation Improvements or additions to documentation Arch: arm Issues related to ARM (32-bit) architecture Arch: avr Issues related to all AVR(8-bit or 32-bit) architectures Area: Drivers Drivers issues labels Dec 22, 2025
acassis
acassis previously approved these changes Dec 22, 2025
@wangchdo
Copy link
Contributor

wangchdo commented Dec 23, 2025

Great improvements for wdog. and I really hope we can work together to improve scheduler and hrtimer separately with you focusing on scheduler related and I focusing on hrtimer related.

You can refer to #17642 , where I believe the issues you raised regarding the current hrtimer behavior in SMP mode have been addressed.

Copy link
Contributor

@anchao anchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was critical section changed back from spinlock? Weren't you optimizing SMP performance?

This commit fixed the MISRA C-2004 violation rule 11.1, 10.4 and more.

Signed-off-by: jiangtao16 <jiangtao16@xiaomi.com>
Signed-off-by: wangzhi16 <wangzhi16@xiaomi.com>
Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
@Fix-Point
Copy link
Contributor Author

Fix-Point commented Dec 23, 2025

Why was critical section changed back from spinlock? Weren't you optimizing SMP performance?

There's a functional correctness issue using spinlock.
Based on spin-lock implementation, we cannot guarantee the functional correctness without changing the semantics of wd_start/wd_cancel. It means we should rewrite all wd_start/wd_cancel code, so we have no way but to revert to the big kernel lock (enter_critical_section).

After modifying the critical section, the execution of the WDOG callback function is no longer atomic. This inevitably introduces a new state: WDOG_INACTIVE | shared. In this state, even though wdog->func == NULL, its ownership does not belong to the user thread, as shown in Figure 2. If the user repurposes the WDOG memory space at this moment, a race condition occurs.

This commmit simplified the code and correct the critical_section.

Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>
Copy link
Contributor

@anchao anchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiaoxiang781216 xiaoxiang781216 merged commit 2dc2b30 into apache:master Dec 24, 2025
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: OS Components OS Components issues Size: L The size of the change in this PR is large Size: M The size of the change in this PR is medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants