Reduce dashboard update cycle cost #690

SimonHeybrock · 2026-02-09T11:09:50Z

Summary

This branch targets the main performance bottlenecks in the dashboard's periodic update cycle, identified through profiling.

Data preparation optimizations reduce the per-update cost of converting and extracting time-series data:

cb6e9bff Avoid hv.Element.relabel() — relabel() clones the entire Element (deep-copying all Param metadata). Passing label= at construction avoids this, yielding ~40-47% speedup for LinePlotter/ImagePlotter.compute() and ~23% for Overlay1DPlotter.
99697faf Cache time origin in FullHistoryExtractor — the epoch + timezone offset are identical across calls, so compute once and fast-path subsequent calls by skipping dtype/unit/coord checks.
b0a2f831 Use indexing instead of min/max for time bounds — data from TemporalBuffer is chronologically ordered, so [0]/[-1] replaces full-array scans.

Rendering optimizations eliminate wasted Bokeh work when the user can't see the result:

b1777172 Skip pipe.send for hidden tabs — with dynamic=True on Tabs, hidden tabs have no Bokeh models; sending data to them was 88% of poll cycle time per profiling. Also adds peek_grid() to avoid an unnecessary deepcopy.
bb184963 Skip pipe.send while config modal is open — the modal obscures all plots, reuses the hidden-tab mechanism.
f8963c42 Batch Bokeh model-graph recomputation — wraps the entire update body in doc.models.freeze(). Each pipe.send() triggers recompute() (BFS over all ~1500 models, ~90 ms each). Batching collapses N recomputes into 1, saving ~(N-1)/N of the cost (~75% for 4 active cells).

Further optimization

Two things need to be explored more:

Tune the update cycle and sleep time of the backgroudn thread and the session periodic callbacks.
Try free-threaded builds, provided all dependencies support it.

relabel() clones the entire Element, triggering deep copies of all Param metadata (kdims, vdims, etc.). Passing label= directly to the HoloViews constructor avoids this overhead entirely. Benchmarks show ~40-47% speedup for LinePlotter and ImagePlotter compute(), and ~23% for Overlay1DPlotter (5 curves). Prompt: Profiling indicates that calling `relabel` in holoviews elements is a substantial fraction of the cost in PlotterCompute; help me understand why, and whether this can be avoided, either by setting the label directly, or potentially reusing objects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The epoch and timezone offset scalars are identical across calls, so compute them once and reuse. This also collapses two array-wide additions into one per extract() call. Subsequent calls take a fast path that skips dtype, unit, and coord-existence checks since they are implied by the cached _time_origin. Prompt: Benchmarks indicate substantial cost is spent in extractors.py, mostly in _ensure_datetime_coord, which I think could easily be avoided since most of it is "static" and the same for every call? Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The data from TemporalBuffer is in chronological order, so start_time[0] and end_time[-1] give the same result as min/max without scanning the arrays. Prompt: The min and max from _ensure_datetime_coord also takes substantial time. Could we just use elements 0 and -1? Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only call `session_layer.update_pipe()` for layers on the currently active grid tab. Since `dynamic=True` on Tabs means hidden tabs have no materialized Bokeh models, sending data to them is wasted work (88% of poll cycle time per profiling). Skipped layers keep their dirty flag set; on tab switch the next poll cycle sends the latest cached state. Also add `peek_grid()` to PlotOrchestrator, which returns the internal grid without `copy.deepcopy()`. The poll method only reads grid config, so the defensive copy was unnecessary overhead. Prompt: Implement the following plan: Skip pipe.send for hidden tabs

The modal overlay obscures all plots, making pipe updates wasted rendering work. Reuses the existing hidden-tab mechanism by forcing active_grid_id to None when the modal is open. Dirty flags are preserved so the first poll after the modal closes pushes the latest cached state. Prompt: The latest commit skips plot update in hidden tabs. Could we do something similar for when a plot config modal dialog is opened (either to add a new plot or to reconfigure an existing one)?

Wrap the periodic update body in doc.models.freeze() alongside the existing pn.io.hold(). This collapses N redundant recompute() calls into a single one at the end of the cycle. Each pipe.send() and layout mutation (e.g. updating widget lists) triggers HoloViews' hold_render decorator, which wraps the plot update in doc.models.freeze(). When multiple such operations run in sequence, each one's unfreeze reaches zero and triggers recompute() — a full BFS traversal of every Bokeh model in the document via collect_models(), costing O(M) where M is the total model count. With ~1500 models in a typical dashboard session, each recompute takes ~90 ms. For 4 active cells per update cycle this means ~350 ms (44% of the cycle) spent in _pop_freeze → recompute → collect_models, with the first N-1 results immediately invalidated by the next operation. Since Bokeh's freeze is reentrant (counter-based), adding an outer freeze() keeps the counter above zero so inner freeze/unfreeze cycles are no-ops. Only the outermost exit triggers recompute, once. Expected saving: (N-1)/N of the _pop_freeze cost (~75% for N=4). Also adds an early return with a warning log if pn.state.curdoc is None, which is a defensive guard for the doc.models.freeze() call. See docs/developer/plans/bokeh-freeze-batching.md for the full analysis. Prompt: Read bokeh-freeze-batching.md plan and implement the fix. Followed by investigation of why pn.io.hold doesn't use freeze internally, prep of a Discourse post for the Panel maintainers, and review of the staged implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The stale session cleanup path (heartbeat timeout) called updater.cleanup() but never stopped the periodic callback, since it had no reference to it. The callback was only stopped via pn.state.on_session_destroyed(), which doesn't fire on browser crashes or network disconnects. The orphaned callback kept firing after the session document was gone, and doc.models.freeze() (added in c3af8bc) triggered Panel's busy counter watchers which accessed state.curdoc.session_context on a None curdoc, producing an AttributeError. Fix: store the periodic callback on SessionUpdater so cleanup() can stop it. Both cleanup paths (normal session destroy and stale cleanup) already call updater.cleanup(). Prompt: "Sometimes we get this exception in the logs. Was it introduced in this branch or already in the base (multi-session): ERROR:bokeh.util.tornado:Error thrown from periodic callback: AttributeError: 'NoneType' object has no attribute 'session_context'" Follow-up: "yes, fix it"

… return The early return when pn.state.curdoc is None prevented custom handlers and notifications from running. Replace with a _batched_update context manager that conditionally applies the freeze optimization only when a document exists. Prompt: "Should we avoid the early return to fix the test? And just not call doc.models.freeze if doc is None?"

jl-wynen · 2026-02-09T12:55:16Z

src/ess/livedata/dashboard/roi_readback_plots.py


-    def plot(self, data: sc.DataArray, data_key: ResultKey, **kwargs) -> hv.Rectangles:
+    def plot(
+        self, data: sc.DataArray, data_key: ResultKey, *, label: str = '', **kwargs


label is missing in the docstring. (And probably elsewhere) Here in particular, can you add and explanation why the argument exists but is unused?

jl-wynen · 2026-02-09T12:55:41Z

tests/dashboard/plotter_compute_benchmark.py

@@ -0,0 +1,95 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright (c) 2025 Scipp contributors (https://github.com/scipp)


Suggested change

# Copyright (c) 2025 Scipp contributors (https://github.com/scipp)

# Copyright (c) 2026 Scipp contributors (https://github.com/scipp)

jl-wynen · 2026-02-09T12:58:35Z

src/ess/livedata/dashboard/widgets/plot_grid_tabs.py

+        # Dirty flags are preserved; the first poll after the modal closes will
+        # push the latest cached state.
+        modal_is_open = self._current_modal is not None
+        active_grid_idx = self._tabs.active - self._static_tabs_count


This looks fishy. Why is the index this difference of counts? (I am assuming self._tabls.active is the number of active tabs.) Also the if 0 <= active_grid_idx < len(grid_keys) check below makes me think that this difference doesn't always correspond to an index.

jl-wynen · 2026-02-09T13:01:29Z

src/ess/livedata/dashboard/widgets/plot_grid_tabs.py

+            grid_keys[active_grid_idx]
+            if 0 <= active_grid_idx < len(grid_keys) and not modal_is_open
+            else None
+        )


Can you move all of the above into a separate function? It is logically separate from the code below.

jl-wynen · 2026-02-09T13:03:55Z

src/ess/livedata/dashboard/extractors.py

+        if dim not in data.coords:
+            return data


Should this check be swapped with the other if above? That would more closely correspond to the old code.

Swapped now, but _time_origin would never have been set so the check seem redundant, right?

Hard to say from this class alone because this depends on inputs of __init__ and extract.

jl-wynen · 2026-02-09T13:06:40Z

src/ess/livedata/dashboard/session_updater.py

+        #   the document via _pop_freeze → recompute → collect_models, at O(M)
+        #   cost. The outer freeze keeps the counter above zero so that inner
+        #   freeze/unfreeze cycles (e.g. HoloViews hold_render) are no-ops,
+        #   collapsing N recomputes into 1.


This comment references the contents of _batched_update and so should be in that function instead of here.

- Add `label` param documentation to ROI readback plotter docstrings, explaining why it exists but is unused. - Fix copyright year (2025 → 2026) in plotter_compute_benchmark.py. - Extract active grid ID determination into `_get_active_grid_id()` method with clear docstring explaining the tab-index-to-grid-index mapping. - Swap check order in `FullHistoryExtractor._to_local_datetime` so `dim not in data.coords` is checked before using the cached time origin, matching the original guard order. - Move freeze/hold explanation comment from `_update` into the `_batched_update` docstring where it belongs. Prompt: "Please address review comments on the PR. Discuss if unclear how." Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SimonHeybrock · 2026-02-09T13:25:30Z

@jl-wynen Thanks for the useful comments, addressed on latest commit!

Background-thread status callbacks from JobService and ServiceRegistry caused ~2x recompute calls vs baseline. Each callback was serialized to MainThread via next_tick_callback with its own freeze/unfreeze cycle, producing separate recompute calls outside the batched periodic update. Convert all three affected widgets (JobStatusListWidget, WorkflowStatusWidget, BackendStatusWidget) to the polling model already used by SessionStatusWidget: register a refresh method as a custom handler with SessionUpdater, so all status updates run inside the periodic _batched_update() context (pn.io.hold + doc.models.freeze). No version counters are needed — the widgets already have internal diffing or set-on-every-cycle property updates that Panel handles as no-ops when unchanged. Also removes the subscriber/notification machinery from JobService and ServiceRegistry, fixing the callback cleanup gap (no unregister mechanism = leaked callbacks from dead sessions), and removes the temporary LIVEDATA_DISABLE_STATUS_CALLBACKS diagnostic env var. Prompt: Read status-callback-performance-investigation.md - we need to continue this investigation and develop a complete and deep understanding of everything involved. Follow-up: Do we actually need the complication of version counters in all cases? Updates in JobService are low frequency (mainly heartbeat timestamp), if we make the polling low frequency we can just always set the latest state as the widget value.

WorkflowStatusWidget previously subscribed to JobOrchestrator lifecycle events (on_staged_changed, on_workflow_committed, on_workflow_stopped) via WidgetLifecycleCallbacks. These callbacks fired synchronously from the triggering session and rebuilt ALL subscribed widgets across ALL sessions outside any batched context, causing unbatched recomputes and cross-session Bokeh model mutations. Add a version counter to WorkflowState that increments on every state mutation. The widget's refresh() method (already called inside SessionUpdater._batched_update()) checks the version and triggers a full rebuild only when it changes. This ensures all widget updates run in the correct session context with batched recomputation. Removed: WidgetLifecycleCallbacks, WidgetLifecycleSubscriptionId, subscribe_to_widget_lifecycle(), unsubscribe_from_widget_lifecycle(), _widget_subscriptions dict, cleanup() on WorkflowStatusWidget. Added: WorkflowState.version field, get_workflow_state_version() getter, version tracking in WorkflowStatusWidget.refresh(). Prompt: Implement the following plan: Convert WorkflowStatusWidget Lifecycle Callbacks to Version-Based Polling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

SimonHeybrock and others added 6 commits February 9, 2026 11:20

SimonHeybrock force-pushed the multi-session-optimization branch from f8963c4 to c3af8bc Compare February 9, 2026 11:21

SimonHeybrock added 2 commits February 9, 2026 11:44

jl-wynen reviewed Feb 9, 2026

View reviewed changes

jl-wynen approved these changes Feb 9, 2026

View reviewed changes

SimonHeybrock and others added 2 commits February 10, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce dashboard update cycle cost #690

Reduce dashboard update cycle cost #690

SimonHeybrock commented Feb 9, 2026 •

edited

Loading

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

SimonHeybrock Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

jl-wynen Feb 9, 2026

Uh oh!

SimonHeybrock commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,95 @@
		# SPDX-License-Identifier: BSD-3-Clause
		# Copyright (c) 2025 Scipp contributors (https://github.com/scipp)

	# Copyright (c) 2025 Scipp contributors (https://github.com/scipp)
	# Copyright (c) 2026 Scipp contributors (https://github.com/scipp)

Reduce dashboard update cycle cost #690

Are you sure you want to change the base?

Reduce dashboard update cycle cost #690

Conversation

SimonHeybrock commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Further optimization

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonHeybrock commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SimonHeybrock commented Feb 9, 2026 •

edited

Loading