fix: infinite loading when reopening workflow in collaborative editor #4175

elias-ba · 2025-12-10T14:15:50Z

Description

This PR fixes the infinite loading issue when reopening a workflow in the collaborative editor.

1. extract_lock_version not handling nil values (persistence.ex)

When a workflow is opened before first save, the Y.Doc gets lock_version: nil. The extract_lock_version/1 function didn't handle the {:ok, nil} case from Yex.Map.fetch/2, causing it to fall through incorrectly. The stale?/2 function was also refactored for clarity.

2. merge_updates applying deltas to empty doc (persistence_writer.ex)

This was the root cause of the infinite loading. When the PersistenceWriter batches multiple Yjs updates together, merge_updates/1 was applying them to an empty Yex.Doc.new(). Since Yjs deltas only contain changes (not full state), applying them to an empty doc produced corrupted 2-byte outputs, losing all workflow data.

Evidence from reproduction:

Before fix: merged updates = 2 bytes (corrupted/empty)
After fix: merged updates = 969-1742 bytes (correct full state)

The fix loads the existing persisted state (checkpoint + updates) from the database before applying new deltas on top.

How this caused the `nil` lock_version crash

User creates new workflow → Y.Doc initialized with lock_version: nil
After 2s debounce → First batch saved with full state (including lock_version: nil) ~1000 bytes ✓
User saves workflow → DB gets real lock_version, Y.Doc updated via merge_saved_workflow_into_ydoc
This generates a delta update → Only contains the lock_version change
After 2s debounce → merge_updates merges pending deltas
BUG: Deltas applied to empty doc → Produces 2-byte corrupted output, lock_version update lost
User reopens workflow → Loads first update (lock_version: nil) + corrupted update (2 bytes)
Result: Y.Doc has lock_version: nil, DB has real value → stale check fails → crash

Closes #4164 #4154

Validation steps

Create a new workflow in collab editor and save it
Add a job node (but don't save)
Navigate back to workflows list
Open the workflow again
Verify workflow loads correctly (no infinite loading)

Additional notes for the reviewer

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

Pre-submission checklist

I have performed a self-review of my code.
I have implemented and tested all related authorization policies. (e.g., :owner, :admin, :editor, :viewer)
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

The bug was in merge_updates/1 in persistence_writer.ex. When multiple Yjs delta updates were batched together, they were applied to an empty Yex.Doc.new() instead of the current persisted state. Since Yjs deltas only contain changes (not full state), applying them to an empty doc produced corrupted 2-byte outputs, losing all workflow data including lock_version. The fix loads the existing persisted state (checkpoint + updates) from the database before applying new deltas on top. Evidence from reproduction: - Before fix: merged updates = 2 bytes (corrupted/empty) - After fix: merged updates = 969-1742 bytes (correct full state) Closes #4164

codecov · 2025-12-10T14:31:05Z

Codecov Report

❌ Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.09%. Comparing base (9eb285d) to head (7578dd2).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/lightning/collaboration/document_state.ex	94.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4175      +/-   ##
==========================================
+ Coverage   89.03%   89.09%   +0.05%     
==========================================
  Files         425      425              
  Lines       19646    19658      +12     
==========================================
+ Hits        17492    17514      +22     
+ Misses       2154     2144      -10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@type

Move duplicated checkpoint and update query logic from Persistence and PersistenceWriter into DocumentState module where data access belongs. - Add get_checkpoint_and_updates/1, get_latest_checkpoint/1, get_updates_since/2, apply_to_doc/3, and load_into_doc/2 functions - Update Persistence to use shared DocumentState functions - Update PersistenceWriter.merge_updates/2 to use DocumentState.load_into_doc/2 - Remove duplicated load_persisted_state_into_doc/2 from PersistenceWriter - Add @type t and proper @SPEC annotations to DocumentState This eliminates code duplication while keeping the fix focused and avoiding conflicts with #4178.

* Fix incorrect field access in create_checkpoint The create_checkpoint/1 function was using .data instead of .state_data when accessing DocumentState records. This would cause a KeyError crash when a document reached 500 updates and triggered checkpoint creation. Closes #4176 * Fix checkpoint creation bugs and add tests Two bugs fixed in create_checkpoint/1: 1. Used .data instead of .state_data when accessing DocumentState records 2. Used string "checkpoint" instead of atom :checkpoint for Ecto.Enum Added tests to verify checkpoint creation works correctly: - Test creating checkpoint from multiple updates - Test merging existing checkpoint with new updates Both bugs would cause crashes when a document reached 500 updates and triggered automatic checkpoint creation. Closes #4176

doc-han · 2025-12-11T09:19:28Z

lib/lightning/collaboration/persistence_writer.ex

  end

-  defp merge_updates([update]), do: update
+  defp merge_updates([update], _document_name), do: update


The fix for my issue #4154 overlaps with what you're doing here. I want to get rid of this function all together. If there's a single update doesn't mean that

we don't want to generate a delta between the current state of the document(_document_name)

also, properly encode the changes using Yex.encode_state_as_update!

with 2. I don't think that just returning the single update is the same as passing it through a new doc and encoding.

On my end, I've consistently reproduced the error whenever there's a single update coming through ydoc which leads to this function. checking whether I can reproduce on this branch.

Awesome. I think this PR fixes #4154 also. just revalidated whether a single update coming through causes an issue and it actually doesn't. Hence the issue with #4154 also had to do with the corrupted bytes. Took me a long route to get to merge_updates function.

github-project-automation bot added this to v2 Dec 10, 2025

github-project-automation bot moved this to New Issues in v2 Dec 10, 2025

elias-ba changed the title ~~Fix infinite loading when reopening workflow in collaborative editor~~ fix: infinite loading when reopening workflow in collaborative editor Dec 10, 2025

elias-ba force-pushed the 4164-fix-nil-lock-version-crash branch from c53ecda to cf3886f Compare December 10, 2025 14:23

This was referenced Dec 10, 2025

create_checkpoint uses non-existent .data field instead of .state_data #4176

Open

fix: incorrect field access in create_checkpoint #4178

Merged

elias-ba requested review from martalovescoffee, stuartc and theroinaochieng December 10, 2025 15:24

elias-ba added 2 commits December 10, 2025 15:54

doc-han reviewed Dec 11, 2025

View reviewed changes

doc-han approved these changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: infinite loading when reopening workflow in collaborative editor #4175

fix: infinite loading when reopening workflow in collaborative editor #4175

Uh oh!

elias-ba commented Dec 10, 2025 •

edited by doc-han

Loading

Uh oh!

codecov bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

doc-han Dec 11, 2025 •

edited

Loading

Uh oh!

doc-han Dec 11, 2025

Uh oh!

doc-han Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: infinite loading when reopening workflow in collaborative editor #4175

Are you sure you want to change the base?

fix: infinite loading when reopening workflow in collaborative editor #4175

Uh oh!

Conversation

elias-ba commented Dec 10, 2025 • edited by doc-han Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How this caused the nil lock_version crash

Validation steps

Additional notes for the reviewer

AI Usage

Pre-submission checklist

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

doc-han Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

doc-han Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

doc-han Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elias-ba commented Dec 10, 2025 •

edited by doc-han

Loading

How this caused the `nil` lock_version crash

codecov bot commented Dec 10, 2025 •

edited

Loading

doc-han Dec 11, 2025 •

edited

Loading