Skip to content

feat: report/propagate client start/run failures#432

Open
xdustinface wants to merge 1 commit intov0.42-devfrom
feat/monitor-network-errors
Open

feat: report/propagate client start/run failures#432
xdustinface wants to merge 1 commit intov0.42-devfrom
feat/monitor-network-errors

Conversation

@xdustinface
Copy link
Collaborator

@xdustinface xdustinface commented Feb 12, 2026

We currently just silently ignore any failure inside the sync coordinator or monitor_network thread. This PR propagates them and also adds a FFI callback structure which can be set to receive notifications about errors inside the SPV client.

Summary by CodeRabbit

  • New Features

    • Client error callback mechanism to register and clear handlers for fatal client errors
  • Improvements

    • Improved error propagation and visibility across client lifecycle and sync operations; background sync now reports failures via the callback
  • Refactor

    • Run flow adjusted to subscribe to events before starting background sync; sync loop now propagates errors instead of swallowing them
  • Tests

    • Added unit tests for callback registration, dispatch, null/no-callback and lifecycle edge cases
  • Documentation

    • Updated API docs to include the new client error callback endpoints and usage notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

Adds a fatal client error callback API (set/clear) with C header and docs updates, implements types and dispatch in Rust, wires client/sync threads to invoke the callback on fatal errors, refactors sync coordinators to propagate errors, and adds unit tests for callback behavior.

Changes

Cohort / File(s) Summary
API Declaration & Documentation
dash-spv-ffi/include/dash_spv_ffi.h, dash-spv-ffi/FFI_API.md
Adds OnClientErrorCallback typedef and FFIClientErrorCallback struct; declares dash_spv_ffi_client_set_client_error_callback and dash_spv_ffi_client_clear_client_error_callback; documents new functions.
Callback Implementation
dash-spv-ffi/src/callbacks.rs
Introduces OnClientErrorCallback type and FFIClientErrorCallback struct with Default, Send/Sync, and a dispatch(&self, error: &str) method that converts Rust strings and invokes the C callback.
Client Lifecycle Integration
dash-spv-ffi/src/client.rs
Adds client_error_callback: Arc<Mutex<Option<FFIClientErrorCallback>>> to FFIDashSpvClient; initializes it; implements FFI set/clear functions; starts sync task after subscribing and dispatches client error callbacks on start/sync failures.
Coordinator Error Propagation
dash-spv/src/client/sync_coordinator.rs, dash-spv/src/sync/sync_coordinator.rs
Refactors monitor/tick loops to propagate and return explicit errors (convert manager failures/panics to Err) instead of silently continuing.
Unit Tests
dash-spv-ffi/tests/unit/test_client_lifecycle.rs
Adds tests for callback firing on start failure, direct dispatch capture, null-client set/clear error codes, and safe behavior with no callback; uses Arc<Mutex<Option<String>>> and raw user_data pointer patterns.

Sequence Diagram

sequenceDiagram
    actor App as Application
    participant C as C API
    participant Client as FFIDashSpvClient
    participant Monitor as Sync Monitor Thread
    participant Callback as Error Callback

    App->>C: dash_spv_ffi_client_set_client_error_callback(cb, user_data)
    C->>Client: store callback (Arc<Mutex<Option<FFIClientErrorCallback>>>)
    C-->>App: return status

    App->>C: dash_spv_ffi_client_run()
    C->>Client: spawn Monitor thread
    Monitor->>Monitor: perform start / sync operations
    alt fatal error occurs
        Monitor->>Client: lock & read client_error_callback
        Client-->>Monitor: callback ref (if set)
        Monitor->>Callback: dispatch(error_str)
        Callback->>App: on_error(error_ptr, user_data)
    end
    Monitor->>Client: continue or shutdown
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitched my nose and heard a click,
A tiny callback solved the trick.
From sync-thread thumps to C-side cheer,
I hop and log — the error's clear.
Tests nod kindly — carrot cake near! 🐇

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: report/propagate client start/run failures' directly corresponds to the main objective of adding error propagation and callback mechanisms for client failures.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/monitor-network-errors

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@dash-spv/src/client/sync_coordinator.rs`:
- Around line 61-63: The select arm currently ignores the Result from
progress_updates.changed(), causing a busy-loop if the watch sender is dropped;
change the match on progress_updates.changed() instead of using `_ =`, and on
Ok(_) log the progress with progress_updates.borrow() and return/continue as
intended, but on Err(RecvError) break the loop (e.g., `break None`) or otherwise
disable the arm so it doesn't repeatedly resolve immediately; update the select
arm handling of the `progress_updates` watch in the sync coordinator so the
closed-channel case is explicitly handled.
🧹 Nitpick comments (2)
dash-spv-ffi/tests/unit/test_client_lifecycle.rs (1)

213-277: Timing-dependent assertion may be flaky on slow CI.

Line 255 uses a fixed 500ms sleep to wait for the sync thread to fail and invoke the callback. On heavily loaded CI runners this could be insufficient, leading to intermittent failures.

Consider retrying the assertion in a polling loop with a timeout instead of a single sleep.

🔧 Suggested polling approach
-            // Give the sync thread time to start and fail
-            thread::sleep(Duration::from_millis(500));
-
-            let received_error = error_store.lock().unwrap();
-            assert!(
-                received_error.is_some(),
-                "Client error callback should have been called on start failure"
-            );
+            // Poll for the callback to fire, with a generous timeout
+            let deadline = std::time::Instant::now() + Duration::from_secs(5);
+            loop {
+                if error_store.lock().unwrap().is_some() {
+                    break;
+                }
+                assert!(
+                    std::time::Instant::now() < deadline,
+                    "Client error callback should have been called on start failure"
+                );
+                thread::sleep(Duration::from_millis(50));
+            }
+
+            let received_error = error_store.lock().unwrap();
dash-spv/src/client/sync_coordinator.rs (1)

54-58: Consider whether every handle_command error should be fatal to the monitoring loop.

Currently, if handle_command returns an error (e.g., ChannelFailure because a requester dropped its oneshot receiver before the response was sent), the entire monitor_network loop terminates. A caller dropping its response channel is arguably not a client-fatal condition — it's a transient per-request failure.

You could log-and-continue for non-fatal command errors to avoid tearing down the whole client:

Possible approach
-                        Some(command) => self.handle_command(command).await.err(),
+                        Some(command) => {
+                            if let Err(e) = self.handle_command(command).await {
+                                tracing::warn!("Command handling error: {}", e);
+                            }
+                            None
+                        },

If the intent is truly "any command failure = shut down", that's fine — just wanted to flag it.

@xdustinface xdustinface force-pushed the feat/monitor-network-errors branch from 19354bc to 532083b Compare February 12, 2026 20:47
@github-actions github-actions bot added the merge-conflict The PR conflicts with the target branch. label Feb 19, 2026
@github-actions
Copy link

This PR has merge conflicts with the base branch. Please rebase or merge the base branch into your branch to resolve them.

@xdustinface xdustinface force-pushed the feat/monitor-network-errors branch from 532083b to 97f4788 Compare February 19, 2026 20:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
dash-spv-ffi/src/callbacks.rs (2)

591-592: Missing SAFETY comments on unsafe impl Send / unsafe impl Sync.

FFISyncEventCallbacks documents why its Send + Sync impls are sound (function pointers are inherently thread-safe; user_data thread-safety is the caller's responsibility). FFIClientErrorCallback should carry the same explanation for consistency and auditability.

✏️ Proposed addition
+// SAFETY: FFIClientErrorCallback is safe to send/share between threads because:
+// 1. The on_error function pointer is an extern "C" fn with no captured state.
+// 2. Thread safety of user_data is the caller's responsibility.
 unsafe impl Send for FFIClientErrorCallback {}
 unsafe impl Sync for FFIClientErrorCallback {}

Based on learnings: "Exercise careful memory safety handling at FFI boundaries."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/callbacks.rs` around lines 591 - 592, Add a SAFETY comment
above the two unsafe impls for FFIClientErrorCallback explaining why
implementing Send and Sync is sound: state that the struct only contains raw
function pointers (which are inherently thread-safe) and a user_data pointer
whose thread-safety is the caller's responsibility, mirroring the justification
used for FFISyncEventCallbacks; ensure the comment mentions any invariants the
caller must uphold (e.g., user_data must remain valid for the callback's
lifetime and be accessed safely across threads).

572-611: FFIClientErrorCallback is placed between FFIWalletEventCallbacks's struct definition and its dispatch impl, fragmenting the wallet type.

The FFIWalletEventCallbacks struct is declared at line 551, but its dispatch impl doesn't appear until line 613 because lines 572–611 insert the entirely unrelated FFIClientErrorCallback type. Moving FFIClientErrorCallback to after the FFIWalletEventCallbacks::dispatch impl (after line 664) would restore cohesion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/callbacks.rs` around lines 572 - 611, FFIClientErrorCallback
is interrupting the logical grouping of FFIWalletEventCallbacks by being
declared between the FFIWalletEventCallbacks struct and its impl dispatch; move
the entire FFIClientErrorCallback declaration (type alias, struct, unsafe impls,
Default impl, and impl dispatch) to below the FFIWalletEventCallbacks::dispatch
impl so the wallet struct and its methods remain contiguous, keeping the symbols
FFIWalletEventCallbacks and its dispatch impl together and relocating
FFIClientErrorCallback (including its OnClientErrorCallback, Default, unsafe
impls, and impl dispatch) after that block.
dash-spv-ffi/src/client.rs (2)

406-410: Redundant explicit drop(guard).

The guard goes out of scope at the end of the if let Err(e) block regardless. The explicit drop on line 410 is unnecessary and is inconsistent with the analogous start-failure path (lines 397–401) which relies on implicit drop.

✂️ Proposed cleanup
             let guard = error_callback.lock().unwrap();
             if let Some(cb) = guard.as_ref() {
                 cb.dispatch(&e.to_string());
             }
-            drop(guard);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 406 - 410, The explicit drop(guard)
after dispatching the error callback is redundant because guard (from let guard
= error_callback.lock().unwrap()) will be dropped at the end of the if let
Err(e) block; remove the drop(guard) call to match the implicit-drop style used
in the start-failure path and avoid unnecessary code. Ensure you update the
block around error_callback.lock().unwrap(), cb.dispatch(&e.to_string()), and
the if let Err(e) { ... } so no explicit drop is present.

397-410: Document the re-entrancy / deadlock constraint in the safety docs for set_client_error_callback.

Both dispatch sites (start failure at lines 397–401 and monitor error at lines 406–410) invoke cb.dispatch() while holding error_callback.lock(). If the C on_error implementation calls back into dash_spv_ffi_client_set_client_error_callback or dash_spv_ffi_client_clear_client_error_callback on the same thread, both functions will attempt to acquire the same std::sync::Mutex, deadlocking.

The # Safety doc for dash_spv_ffi_client_set_client_error_callback should explicitly prohibit re-entrant calls into the callback management API:

📝 Proposed doc addition (lines 829–833)
 /// # Safety
 /// - `client` must be a valid, non-null pointer to an `FFIDashSpvClient`.
 /// - The `callback` struct and its `user_data` must remain valid until the callback is cleared.
 /// - The callback must be thread-safe as it may be called from a background thread.
+/// - The callback **must not** call `dash_spv_ffi_client_set_client_error_callback` or
+///   `dash_spv_ffi_client_clear_client_error_callback` from within the callback body;
+///   doing so will deadlock on the internal mutex.

Based on learnings: "Exercise careful memory safety handling at FFI boundaries."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 397 - 410, The callback dispatch
calls (cb.dispatch) occur while holding the error_callback Mutex in both the
startup error path and in monitor_network, which can deadlock if the C on_error
handler calls back into dash_spv_ffi_client_set_client_error_callback or
dash_spv_ffi_client_clear_client_error_callback; update the safety docs for
dash_spv_ffi_client_set_client_error_callback (and mention
dash_spv_ffi_client_clear_client_error_callback) to explicitly prohibit
re-entrant calls into the callback-management API and note the Mutex/deadlock
risk, and preferably change the dispatch sites to drop the guard (release
error_callback.lock()) before calling cb.dispatch or document that callers must
not call those FFI functions from the callback (choose one approach and state it
clearly in the # Safety text).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dash-spv-ffi/tests/unit/test_client_lifecycle.rs`:
- Around line 218-267: The test frees the Sender early
(drop(Box::from_raw(tx_ptr))) while the client's sync thread (spawned by
dash_spv_ffi_client_run and running monitor_network()) may still call the FFI
callback on_error via the dangling user_data pointer; move the
drop(Box::from_raw(tx_ptr)) so it happens only after calling
dash_spv_ffi_client_stop (and dash_spv_ffi_client_destroy if needed) to ensure
threads have been cancelled/joined, and optionally replace the thread::sleep
readiness heuristic with a synchronization primitive or channel from the sync
thread to the test to reliably know when monitor_network() is running before
attempting the second run.

In `@dash-spv/src/sync/sync_coordinator.rs`:
- Around line 237-244: Add a new SyncError::TaskPanicked variant in error.rs
(include a String or JoinError-wrapping payload and Display/From impls as
needed), then update the panic-handling branch in sync_coordinator.rs (the
Err(e) arm that currently returns SyncError::InvalidState(...)) to return
SyncError::TaskPanicked(format!("Manager task panicked: {}", e)) instead of
using InvalidState; ensure any call sites or conversions compile by adding
From<JoinError> for SyncError or matching construction in place.

---

Nitpick comments:
In `@dash-spv-ffi/src/callbacks.rs`:
- Around line 591-592: Add a SAFETY comment above the two unsafe impls for
FFIClientErrorCallback explaining why implementing Send and Sync is sound: state
that the struct only contains raw function pointers (which are inherently
thread-safe) and a user_data pointer whose thread-safety is the caller's
responsibility, mirroring the justification used for FFISyncEventCallbacks;
ensure the comment mentions any invariants the caller must uphold (e.g.,
user_data must remain valid for the callback's lifetime and be accessed safely
across threads).
- Around line 572-611: FFIClientErrorCallback is interrupting the logical
grouping of FFIWalletEventCallbacks by being declared between the
FFIWalletEventCallbacks struct and its impl dispatch; move the entire
FFIClientErrorCallback declaration (type alias, struct, unsafe impls, Default
impl, and impl dispatch) to below the FFIWalletEventCallbacks::dispatch impl so
the wallet struct and its methods remain contiguous, keeping the symbols
FFIWalletEventCallbacks and its dispatch impl together and relocating
FFIClientErrorCallback (including its OnClientErrorCallback, Default, unsafe
impls, and impl dispatch) after that block.

In `@dash-spv-ffi/src/client.rs`:
- Around line 406-410: The explicit drop(guard) after dispatching the error
callback is redundant because guard (from let guard =
error_callback.lock().unwrap()) will be dropped at the end of the if let Err(e)
block; remove the drop(guard) call to match the implicit-drop style used in the
start-failure path and avoid unnecessary code. Ensure you update the block
around error_callback.lock().unwrap(), cb.dispatch(&e.to_string()), and the if
let Err(e) { ... } so no explicit drop is present.
- Around line 397-410: The callback dispatch calls (cb.dispatch) occur while
holding the error_callback Mutex in both the startup error path and in
monitor_network, which can deadlock if the C on_error handler calls back into
dash_spv_ffi_client_set_client_error_callback or
dash_spv_ffi_client_clear_client_error_callback; update the safety docs for
dash_spv_ffi_client_set_client_error_callback (and mention
dash_spv_ffi_client_clear_client_error_callback) to explicitly prohibit
re-entrant calls into the callback-management API and note the Mutex/deadlock
risk, and preferably change the dispatch sites to drop the guard (release
error_callback.lock()) before calling cb.dispatch or document that callers must
not call those FFI functions from the callback (choose one approach and state it
clearly in the # Safety text).

We currently just silently ignore any failure inside the sync coordinator or `monitor_network` thread. This PR propagates them and also adds a FFI callback structure which can be set to receive notifications about errors inside the SPV client.
@xdustinface xdustinface force-pushed the feat/monitor-network-errors branch from 97f4788 to 15a84d4 Compare February 20, 2026 16:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
dash-spv-ffi/src/client.rs (1)

294-312: run() doc comment no longer reflects the error-reporting model.

The # Returns / 0 on success, error code on failure note in both the Rust doc and the C header implies synchronous start-failure reporting. After this PR, run() practically always returns FFIErrorCode::Success (only a null client yields a non-zero code); start failures and sync errors are delivered asynchronously via client_error_callback. Callers who relied on the return value to gate subsequent work will silently miss start failures.

📝 Suggested doc update
 /// # Returns
-/// 0 on success, error code on failure.
+/// 0 if background tasks were successfully spawned, non-zero on immediate failure
+/// (e.g. null client pointer).
+///
+/// Note: start failures and sync errors are reported **asynchronously** via the
+/// client error callback set with `dash_spv_ffi_client_set_client_error_callback`.
+/// Set the callback before calling `run()` to receive these notifications.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 294 - 312, The doc for run() is
misleading about error reporting; update the Rust doc comment (and corresponding
C header comment if generated) to state that run() returns FFIErrorCode::Success
in normal cases (only a null client returns a non-zero code) and that start and
runtime errors are reported asynchronously via the client_error_callback; also
advise callers to register their callbacks (set_sync_event_callbacks,
set_network_event_callbacks, set_wallet_event_callbacks, and
client_error_callback) before calling run() so they will receive start/sync
errors instead of relying on the synchronous return value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dash-spv-ffi/src/callbacks.rs`:
- Around line 589-596: FFIClientErrorCallback is missing Clone and safety
rationale comments which forces holding the Arc<Mutex<...>> lock while calling
the C callback; add #[derive(Clone)] to the FFIClientErrorCallback struct so it
can be cloned and released before dispatch (consistent with other callbacks used
by spawn_broadcast_monitor / spawn_progress_monitor) and add the same "//
SAFETY:" comment block above the unsafe impl Send and unsafe impl Sync for
FFIClientErrorCallback explaining why it is safe to send/sync (mirror the
wording used by the other callback structs) so reviewers can see the invariants
that justify the unsafe impls.

In `@dash-spv-ffi/src/client.rs`:
- Around line 385-401: The error callback mutex (client_error_callback) is held
while calling the C callback in both the spv_client.start() error branch and the
spv_client.monitor_network(...) error branch, causing a deadlock if the C
callback re-enters the FFI and tries to lock the same mutex; fix it by deriving
Clone on FFIClientErrorCallback in callbacks.rs (add #[derive(Clone)] to that
struct) and change both dispatch sites to clone the
Option<FFIClientErrorCallback> while holding the lock, drop the lock, and then
call dispatch on the cloned callback (i.e., let cb =
client_error_callback.lock().unwrap().clone(); if let Some(ref cb) = cb {
cb.dispatch(...); }) so no C callback is invoked while the mutex is held.

---

Duplicate comments:
In `@dash-spv/src/sync/sync_coordinator.rs`:
- Around line 241-243: The code reuses SyncError::InvalidState when the manager
task panics; create and use a distinct error variant (e.g.,
SyncError::ManagerPanicked or SyncError::TaskJoinError) instead of InvalidState
to avoid duplicate/ambiguous errors: add the new variant to the SyncError enum,
update its Display/From/serialization implementations as needed, and replace the
return Err(SyncError::InvalidState(...)) in the Err(e) arm inside
sync_coordinator.rs (the block that currently logs "Manager task panicked: {}")
to return the new variant with the panic details; also adjust any callers/tests
that match on InvalidState to handle the new variant.

---

Nitpick comments:
In `@dash-spv-ffi/src/client.rs`:
- Around line 294-312: The doc for run() is misleading about error reporting;
update the Rust doc comment (and corresponding C header comment if generated) to
state that run() returns FFIErrorCode::Success in normal cases (only a null
client returns a non-zero code) and that start and runtime errors are reported
asynchronously via the client_error_callback; also advise callers to register
their callbacks (set_sync_event_callbacks, set_network_event_callbacks,
set_wallet_event_callbacks, and client_error_callback) before calling run() so
they will receive start/sync errors instead of relying on the synchronous return
value.

Comment on lines +589 to +596
#[repr(C)]
pub struct FFIClientErrorCallback {
pub on_error: OnClientErrorCallback,
pub user_data: *mut c_void,
}

unsafe impl Send for FFIClientErrorCallback {}
unsafe impl Sync for FFIClientErrorCallback {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

FFIClientErrorCallback is missing #[derive(Clone)] and // SAFETY: comments.

Every other callback struct in this file derives Clone (enabling the lock-release-before-dispatch pattern used in spawn_broadcast_monitor/spawn_progress_monitor). The absence here forces the dispatch code in client.rs to hold the Arc<Mutex<…>> lock while invoking the C callback, creating a deadlock if the callback re-enters the FFI to set or clear the error callback (see client.rs comment below).

Additionally, the unsafe impl Send / unsafe impl Sync blocks lack the // SAFETY: rationale present on every other callback struct in the file.

♻️ Proposed fix
+/// Client error callback configuration.
 #[repr(C)]
+#[derive(Clone)]
 pub struct FFIClientErrorCallback {
     pub on_error: OnClientErrorCallback,
     pub user_data: *mut c_void,
 }
 
+// SAFETY: FFIClientErrorCallback is safe to send/share between threads because:
+// 1. The callback function pointer is an extern "C" fn with no captured state.
+// 2. The user_data pointer is treated as opaque and managed by the caller.
+// 3. Thread safety of user_data is the caller's responsibility.
 unsafe impl Send for FFIClientErrorCallback {}
 unsafe impl Sync for FFIClientErrorCallback {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[repr(C)]
pub struct FFIClientErrorCallback {
pub on_error: OnClientErrorCallback,
pub user_data: *mut c_void,
}
unsafe impl Send for FFIClientErrorCallback {}
unsafe impl Sync for FFIClientErrorCallback {}
/// Client error callback configuration.
#[repr(C)]
#[derive(Clone)]
pub struct FFIClientErrorCallback {
pub on_error: OnClientErrorCallback,
pub user_data: *mut c_void,
}
// SAFETY: FFIClientErrorCallback is safe to send/share between threads because:
// 1. The callback function pointer is an extern "C" fn with no captured state.
// 2. The user_data pointer is treated as opaque and managed by the caller.
// 3. Thread safety of user_data is the caller's responsibility.
unsafe impl Send for FFIClientErrorCallback {}
unsafe impl Sync for FFIClientErrorCallback {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/callbacks.rs` around lines 589 - 596, FFIClientErrorCallback
is missing Clone and safety rationale comments which forces holding the
Arc<Mutex<...>> lock while calling the C callback; add #[derive(Clone)] to the
FFIClientErrorCallback struct so it can be cloned and released before dispatch
(consistent with other callbacks used by spawn_broadcast_monitor /
spawn_progress_monitor) and add the same "// SAFETY:" comment block above the
unsafe impl Send and unsafe impl Sync for FFIClientErrorCallback explaining why
it is safe to send/sync (mirror the wording used by the other callback structs)
so reviewers can see the invariants that justify the unsafe impls.

Comment on lines +385 to 401
if let Err(e) = spv_client.start().await {
tracing::error!("Sync thread: client start error: {}", e);
let guard = error_callback.lock().unwrap();
if let Some(cb) = guard.as_ref() {
cb.dispatch(&e.to_string());
}
return;
}

if let Err(e) = spv_client.monitor_network(shutdown_token).await {
tracing::error!("Sync task: sync error: {}", e);
let guard = error_callback.lock().unwrap();
if let Some(cb) = guard.as_ref() {
cb.dispatch(&e.to_string());
}
drop(guard);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deadlock: client_error_callback mutex held during C callback invocation.

Both the start-failure path (lines 387–390) and the sync-error path (lines 396–399) acquire the Arc<Mutex<Option<FFIClientErrorCallback>>> guard and then call cb.dispatch(...) — which invokes the raw C function pointer — while the guard is still held.

If the C callback re-enters the FFI (a natural pattern, e.g. to clear or replace the callback after receiving a fatal error):

// In user's callback:
void on_error(const char *msg, void *ctx) {
    dash_spv_ffi_client_clear_client_error_callback(client); // → tries to lock same mutex
}

dash_spv_ffi_client_clear_client_error_callback calls client.client_error_callback.lock().unwrap(). std::sync::Mutex is not re-entrant; the calling thread blocks waiting for the lock that it already holds → deadlock.

This is inconsistent with every other dispatch site in this file (spawn_broadcast_monitor, spawn_progress_monitor) which correctly clone the callback data, release the lock, and then dispatch:

// Correct pattern used everywhere else:
let cb = callbacks.lock().unwrap().clone();  // lock released here
if let Some(ref cb) = cb {
    dispatch_fn(cb, &event);                 // dispatched without lock
}

The fix requires two coordinated changes:

  1. Add #[derive(Clone)] to FFIClientErrorCallback in callbacks.rs (see comment there).
  2. Switch both dispatch sites in this file to the clone-first pattern.
🔒 Proposed fix — both dispatch sites
-        if let Err(e) = spv_client.start().await {
-            tracing::error!("Sync thread: client start error: {}", e);
-            let guard = error_callback.lock().unwrap();
-            if let Some(cb) = guard.as_ref() {
-                cb.dispatch(&e.to_string());
-            }
-            return;
-        }
+        if let Err(e) = spv_client.start().await {
+            tracing::error!("Sync thread: client start error: {}", e);
+            let cb = error_callback.lock().unwrap().clone();
+            if let Some(cb) = cb {
+                cb.dispatch(&e.to_string());
+            }
+            return;
+        }

         if let Err(e) = spv_client.monitor_network(shutdown_token).await {
             tracing::error!("Sync task: sync error: {}", e);
-            let guard = error_callback.lock().unwrap();
-            if let Some(cb) = guard.as_ref() {
-                cb.dispatch(&e.to_string());
-            }
-            drop(guard);
+            let cb = error_callback.lock().unwrap().clone();
+            if let Some(cb) = cb {
+                cb.dispatch(&e.to_string());
+            }
         }

And in dash-spv-ffi/src/callbacks.rs:

 #[repr(C)]
+#[derive(Clone)]
 pub struct FFIClientErrorCallback {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dash-spv-ffi/src/client.rs` around lines 385 - 401, The error callback mutex
(client_error_callback) is held while calling the C callback in both the
spv_client.start() error branch and the spv_client.monitor_network(...) error
branch, causing a deadlock if the C callback re-enters the FFI and tries to lock
the same mutex; fix it by deriving Clone on FFIClientErrorCallback in
callbacks.rs (add #[derive(Clone)] to that struct) and change both dispatch
sites to clone the Option<FFIClientErrorCallback> while holding the lock, drop
the lock, and then call dispatch on the cloned callback (i.e., let cb =
client_error_callback.lock().unwrap().clone(); if let Some(ref cb) = cb {
cb.dispatch(...); }) so no C callback is invoked while the mutex is held.

@github-actions github-actions bot removed the merge-conflict The PR conflicts with the target branch. label Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments