sched/wdog: Fix-up for SMP and Improvements. #17640

Fix-Point · 2025-12-22T12:13:33Z

Summary

This PR is part III of the #17556 and based on #17584, it primarily introduces fix-up and improvements to the WDOG module, including the following key changes:

Removed the wd_recover interface to simplify the WDOG module, reduce coupling, and decrease code size.
Added the list_delete_fast interface to improve WDOG deletion performance. Theoretically, this saves at least 1 write operation on the path where a WDOG needs to be deleted, and 2 write operations when a WDOG needs to be reinserted.
Reverted the WDOG spin-lock to the big kernel lock (enter_critical_section) to fix concurrency-related bugs.
Simplified wd_timer to reduce performance overhead and Worst-Case Execution Time (WCET) for timeout handling.
Modified code organization to comply with the MISRA-C coding standard.

Special attention is required regarding why we reverted the fine-grained WDOG spin-lock to the big kernel lock (enter_critical_section).

In the earliest implementation, the WDOG had only two states:

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#FFFFFF',
      'primaryTextColor' : '#000000',
      'mermiad-container': "#FFFFFF",
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#FFFFFF',
      'tertiaryColor': '#000000'
    },
    'sequence': { 'mirrorActors': false }
  }
}%%

stateDiagram-v2
    WDOG_INACTIVE --> WDOG_INACTIVE : wd_cancel
    WDOG_INACTIVE --> WDOG_ACTIVE : wd_start
    WDOG_ACTIVE --> WDOG_ACTIVE : wd_start
    WDOG_ACTIVE --> WDOG_INACTIVE : wd_cancel or callback finished

Figure 1. Correct WDOG state diagram.

This implementation was functionally correct in an SMP environment. However, after switching to a spin-lock and relaxing the critical section, the WDOG entered three states:

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#FFFFFF',
      'primaryTextColor' : '#000000',
      'mermiad-container': "#FFFFFF",
      'primaryBorderColor': '#000000',
      'lineColor': '#000000',
      'secondaryColor': '#FFFFFF',
      'tertiaryColor': '#000000'
    },
    'sequence': { 'mirrorActors': false }
  }
}%%

stateDiagram-v2
    WDOG_INACTIVE|private --> WDOG_ACTIVE|shared : wd_start
    WDOG_ACTIVE|shared --> WDOG_INACTIVE|shared :  execute callback
    WDOG_INACTIVE|shared --> WDOG_ACTIVE|shared : wd_start in callback to restart
    WDOG_INACTIVE|shared --> WDOG_INACTIVE|private : callback finished
    WDOG_ACTIVE|shared --> WDOG_INACTIVE|private : wd_cancel
    WDOG_INACTIVE|shared --> WDOG_INACTIVE|shared : wd_cancel failed

Figure 2. Incorrect WDOG state diagram in current implementation.

After modifying the critical section, the execution of the WDOG callback function is no longer atomic. This inevitably introduces a new state: WDOG_INACTIVE | shared. In this state, even though wdog->func == NULL, its ownership does not belong to the user thread, as shown in Figure 2. If the user repurposes the WDOG memory space at this moment, a race condition occurs.

Race Condition Example:

At time $$t_0$$: The interrupt handler thread executes the WDOG callback and is about to restart the WDOG within the callback.
At time $$t_1$$: The user thread calls wd_cancel, reads wdog->func == NULL, mistakenly assumes the WDOG is in the WDOG_INACTIVE | private state, and returns directly. The user thread then calls memset to set the WDOG memory space to 0x55aaaa55.
At time $$t_2$$: The interrupt handler thread calls wd_start, reads a non-NULL wdog->func (0x55aaaa55), and attempts to remove the WDOG.
At time $$t_3$$: During the removal process, the interrupt handler thread reads the pointer 0x55aaaa55, causing a memory access fault.

To address this issue, a straightforward idea is to call wd_cancel before using the WDOG to ensure it enters the WDOG_INACTIVE | private state. However, even this simple approach is not easily achievable. Closer inspection of the state diagram reveals:

Only by calling wd_cancel in the WDOG_ACTIVE | shared state can the WDOG return to the user-available WDOG_INACTIVE | private state.
In the WDOG_INACTIVE | shared state, wd_cancel fails. If the user thread mistakenly believes the WDOG is in the WDOG_INACTIVE | private state and requeues the WDOG, a race condition still occurs, leading to runtime errors.

In this scenario, to ensure WDOG concurrency correctness, it becomes necessary to loop wd_cancel until it succeeds. However, this looping approach introduces the following problems:

Starvation Risk: Suppose the WDOG continuously cycles through WDOG_INACTIVE | shared $$\rightarrow^{wd_start}$$ WDOG_ACTIVE | shared $$\rightarrow^{\text{execute callback}}$$ WDOG_INACTIVE | shared as a periodic timer. If wd_cancel is called precisely when the WDOG is in the WDOG_INACTIVE | shared state (executing the callback), wd_cancel might fail indefinitely. In other words, wd_cancel can starve, violating the system's real-time and deterministic properties.
Requires Waiting for Callback Completion: Even if the WDOG is not a periodic timer (i.e., no WDOG_INACTIVE | shared $$\rightarrow^{wd_start}$$ WDOG_ACTIVE | shared transition), using the WDOG still requires waiting for the callback to finish. However, the current implementation lacks such a waiting mechanism.

Explanation of other related points:
Why not improve the state machine based on the new fine-grained spin-lock?
After introducing the third state WDOG_INACTIVE | shared, we could enhance the state machine to achieve the following:

On single-core platforms, it would be possible to implement semantics identical to the original wd_cancel.
On SMP platforms, hazard pointers could be used to avoid the issue where periodic WDOGs cannot be canceled. However, this would require waiting for remote threads to complete, meaning we cannot achieve semantics fully identical to the original wd_cancel. Consequently, we would have to reimplement all code that uses WDOGs.

For example, under the asynchronous wd_cancel semantics the following problematic thread interleaving could still occur:

   The previous `nxsem_timeout` callback function caused the second semaphore wait to fail.
    
        Core 0 [nxsem_clockwait]         Core 1
        enter_critical_section()       |  ...
        wd_start(nxsem_timeout)        |  ...
        nxsem_wait(sem)                |  wd_expiration() --> nxsem_timeout
        waked up by others             |
        wd_cancel(&rtcb->waitdog)      |  try enter_critical_section()
        leave_critical_section()       |  Failed retry...(Blocking by Core n)
        ... call nxsem_clockwait again |  Failed retry...
        enter_critical_section()       |  Failed retry...
        wd_start(nxsem_timeout)        |  Failed retry...
        nxsem_wait(sem)                |  Core 1 enter the critical section
                                       |  nxsem_wait_irq(wtcb, ETIMEDOUT) -> incorrectly wake-up the rtcb.

According to statistics, WDOG usage appears in 219/189 files across 400/427 locations where wd_start/wd_cancel are called. Reviewing and rewriting all related logic would clearly be impractical.

Why not use a recursive spin-lock (rspin_lock) but instead revert to a big kernel lock?
Because it could lead to circular waiting and deadlock:

Attempting to acquire the big kernel lock while holding the WDOG lock: Calling system functions within the WDOG timeout callback will attempt to acquire the big kernel lock.
Attempting to acquire the WDOG lock while holding the big kernel lock: This occurs when tcb->waitdog is restarted within a critical section.

Why is the most time-consuming operation, nxsched_reassess_timer(), also placed within the critical section?

If we do not place it within the critical section, after inserting a timer that needs to trigger immediately, the thread might be preempted (e.g., by a network interrupt). This could delay the timer setup, potentially causing uncertain timer delays and causing tasks to miss their deadlines.

Impact

This change will affect the WDOG module's code size, performance, critical section entry/exit time, and the WCET (Worst-Case Execution Time) of timeout handling.

The removal of wd_recover should result in an overall reduction in code size.
Adding list_delete_fast and replacing the original WDOG deletion interface reduces code size and slightly improves performance.
While simplifying wd_timer optimizes the performance and WCET of timer timeout handling, reverting the spin-lock to a big-kernel-lock significantly increases the execution time within the critical section. However, this sacrifice is necessary for functional correctness. Without functional correctness, high performance is meaningless. Once the HRTimer is integrated, we will gradually replace WDOG calls with hrtimer and apply optimizations to minimize the size of WDOG's critical sections as much as possible.

Testing

We ran ostest on rv-virt:smp to verify functional correctness, with CONFIG_SCHED_TICKLESS=y and CONFIG_RR_INTERVAL=0 enabled.
We also launched 4 instances and executed ostest 64 times in total. All test results passed.

acassis

@Fix-Point really good description and explanation about the issue! Kudos!!

This commit Removed wd_recover. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

This commit reduced 1 write operation for wdog deleting and decoupled the WDOG_ISACTIVE with the list implementation. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

This commit removed the workaround for wdog latency. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

…ection). If the wdog use the fine-grained spin-lock, and allow the callback execution without the lock held, there will be incorrect interleaving. E.g. The first `nxsem_timeout` callback function caused the second semaphore wait to fail. Core 0 [nxsem_clockwait] | Core 1 enter_critical_section() | ... wd_start(nxsem_timeout) | ... nxsem_wait(sem) | wd_expiration() --> nxsem_timeout wd_cancel(&rtcb->waitdog) | try enter_critical_section() leave_critical_section() | Failed retry... ....nxsem_clockwait | Failed retry... enter_critical_section() | Failed retry... wd_start(nxsem_timeout) | Failed retry... nxsem_wait(sem) | Core 1 enter the critical section | nxsem_wait_irq(wtcb, ETIMEDOUT) -> incorrectly wake-up the rtcb. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

This commit simplified wd_timer() to reduce WCET when `noswitches` is false. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

wangchdo · 2025-12-23T01:46:55Z

Great improvements for wdog. and I really hope we can work together to improve scheduler and hrtimer separately with you focusing on scheduler related and I focusing on hrtimer related.

You can refer to #17642 , where I believe the issues you raised regarding the current hrtimer behavior in SMP mode have been addressed.

sched/wdog/wd_cancel.c

anchao

Why was critical section changed back from spinlock? Weren't you optimizing SMP performance?

sched/wdog/wd_cancel.c

sched/wdog/wd_gettime.c

sched/wdog/wd_start.c

This commit fixed the MISRA C-2004 violation rule 11.1, 10.4 and more. Signed-off-by: jiangtao16 <jiangtao16@xiaomi.com> Signed-off-by: wangzhi16 <wangzhi16@xiaomi.com> Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

Fix-Point · 2025-12-23T09:33:16Z

Why was critical section changed back from spinlock? Weren't you optimizing SMP performance?

There's a functional correctness issue using spinlock.
Based on spin-lock implementation, we cannot guarantee the functional correctness without changing the semantics of wd_start/wd_cancel. It means we should rewrite all wd_start/wd_cancel code, so we have no way but to revert to the big kernel lock (enter_critical_section).

After modifying the critical section, the execution of the WDOG callback function is no longer atomic. This inevitably introduces a new state: WDOG_INACTIVE | shared. In this state, even though wdog->func == NULL, its ownership does not belong to the user thread, as shown in Figure 2. If the user repurposes the WDOG memory space at this moment, a race condition occurs.

sched/wdog/wd_cancel.c

This commmit simplified the code and correct the critical_section. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

anchao

LGTM

Fix-Point requested review from GUIDINGLI, acassis, anchao, antmerlino, davids5, gustavonihei, hartmannathan, jerpelea, michallenc, pussuw, raiden00pl, simbit18 and xiaoxiang781216 as code owners December 22, 2025 12:13

acassis previously approved these changes Dec 22, 2025

View reviewed changes

Fix-Point force-pushed the hrtimer_p3 branch from 4d539bf to 87f046e Compare December 22, 2025 12:31

jerpelea previously approved these changes Dec 22, 2025

View reviewed changes

Fix-Point added 5 commits December 22, 2025 21:30

sched/wdog: Remove wd_recover.

ccce1fc

This commit Removed wd_recover. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

sched/wdog: Faster wdog deleting.

15fa90b

This commit reduced 1 write operation for wdog deleting and decoupled the WDOG_ISACTIVE with the list implementation. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

sched/wdog: Remove workaround for wdog latency.

dd342a8

This commit removed the workaround for wdog latency. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

sched/wdog: Simplify wd_timer to reduce WCET.

ad729bb

This commit simplified wd_timer() to reduce WCET when `noswitches` is false. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

Fix-Point dismissed stale reviews from jerpelea and acassis via c9dbe72 December 22, 2025 13:30

Fix-Point force-pushed the hrtimer_p3 branch from 87f046e to c9dbe72 Compare December 22, 2025 13:30

acassis previously approved these changes Dec 22, 2025

View reviewed changes

wangchdo mentioned this pull request Dec 23, 2025

sched/hrtimer: hrtimer state machine improvement for SMP cases #17642

Open

xiaoxiang781216 approved these changes Dec 23, 2025

View reviewed changes

xiaoxiang781216 requested changes Dec 23, 2025

View reviewed changes

sched/wdog/wd_cancel.c Outdated Show resolved Hide resolved

Fix-Point dismissed acassis’s stale review via ebc4ea4 December 23, 2025 06:53

Fix-Point force-pushed the hrtimer_p3 branch from c9dbe72 to ebc4ea4 Compare December 23, 2025 06:53

anchao requested changes Dec 23, 2025

View reviewed changes

sched/wdog: compatible with MISRA-C

3e686ee

This commit fixed the MISRA C-2004 violation rule 11.1, 10.4 and more. Signed-off-by: jiangtao16 <jiangtao16@xiaomi.com> Signed-off-by: wangzhi16 <wangzhi16@xiaomi.com> Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

Fix-Point force-pushed the hrtimer_p3 branch from ebc4ea4 to e685cc1 Compare December 23, 2025 09:27

xiaoxiang781216 previously approved these changes Dec 23, 2025

View reviewed changes

anchao reviewed Dec 24, 2025

View reviewed changes

sched/wdog/wd_cancel.c Outdated Show resolved Hide resolved

sched/wdog: Simplify the code and correct the critical_section.

7012342

This commmit simplified the code and correct the critical_section. Signed-off-by: ouyangxiangzhen <ouyangxiangzhen@xiaomi.com>

Fix-Point dismissed xiaoxiang781216’s stale review via 7012342 December 24, 2025 02:20

Fix-Point force-pushed the hrtimer_p3 branch from e685cc1 to 7012342 Compare December 24, 2025 02:20

anchao approved these changes Dec 24, 2025

View reviewed changes

xiaoxiang781216 approved these changes Dec 24, 2025

View reviewed changes

xiaoxiang781216 merged commit 2dc2b30 into apache:master Dec 24, 2025
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sched/wdog: Fix-up for SMP and Improvements. #17640

sched/wdog: Fix-up for SMP and Improvements. #17640

Uh oh!

Fix-Point commented Dec 22, 2025 •

edited

Loading

Uh oh!

acassis left a comment

Uh oh!

wangchdo commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

anchao left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix-Point commented Dec 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

anchao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sched/wdog: Fix-up for SMP and Improvements. #17640

sched/wdog: Fix-up for SMP and Improvements. #17640

Uh oh!

Conversation

Fix-Point commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Impact

Testing

Uh oh!

acassis left a comment

Choose a reason for hiding this comment

Uh oh!

wangchdo commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anchao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix-Point commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anchao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fix-Point commented Dec 22, 2025 •

edited

Loading

wangchdo commented Dec 23, 2025 •

edited

Loading

Fix-Point commented Dec 23, 2025 •

edited

Loading