Draft: Add loom simulations of spfs clean data race #1064

jrray · 2024-07-02T21:25:25Z

$ cargo test --bin clean-loom

This shows three variations on how spfs clean can race with writers to a repo.

The first one tries to simulate the existing behavior and the test fails as expected.

The other two demonstrate how a write-ahead log might prevent races. The first one requires the write-ahead log to be locked while reading in all the tags, which in turn would block writes for a significant amount of time.

The second one proposes how the write-ahead log could be pruned over time with a kakfa-like tombstone concept, and only requires the log to be locked while actually deleting files.

In either case, the time while the write-ahead log is locked could early exit in order to not block writers for an extended amount of time. In this way it could make incremental progress by deleting some number of objects in each pass.

$ cargo test --bin clean-loom This shows three variations on how `spfs clean` can race with writers to a repo. The first one tries to simulate the existing behavior and the test fails as expected. The other two demonstrate how a write-ahead log might prevent races. The first one requires the write-ahead log to be locked while reading in all the tags, which in turn would block writes for a significant amount of time. The second one proposes how the write-ahead log could be pruned over time with a kakfa-like tombstone concept, and only requires the log to be locked while actually deleting files. In either case, the time while the write-ahead log is locked could early exit in order to not block writers for an extended amount of time. In this way it could make incremental progress by deleting some number of objects in each pass. Signed-off-by: J Robert Ray <jrray@jrray.org>

UPDATE: this approach still has a race but the original version wasn't testing the condition where the race exists. The test has been updated to show that a race still exists. This approach doesn't require there to be any "global" lock while performing a clean, but the trade-off is there is more overhead for writing objects and for performing a clean. In summary, when writing an object, first a <digest>.hold file will be written. The existence of this file prevents <digest> from being garbage. The <digest>.hold file is not removed until after a tag exists that creates a strong reference to the object. The clean process collects what it thinks are garbage objects as usual, but then it checks for .hold files for each candidate garbage object. Then, a second pass over all the tags is needed to discover any strong references that didn't exist in the first pass. Since looking for tags happens after checking for .hold files, the new tags are guaranteed to be found. The tag must exist by the time the .hold file is deleted. Signed-off-by: J Robert Ray <jrray@jrray.org>

The two cases that use locking continue to pass. Signed-off-by: J Robert Ray <jrray@jrray.org>

This approach avoids the need to hold a lock for the span of a whole clean operation, but uses a lock per object to prevent newly-written objects from being cleaned. The key is stopping a writer from writing an object just before the cleaner deletes it. A staging file is used for the writer to mark an object as non-garbage. Synchronization is still needed so either the deleter sees the staging file, or the writer waits until it is safe to write a new copy of the object. Signed-off-by: J Robert Ray <jrray@jrray.org>

This one tries to detect any deletes that produce a dangling reference. It uses the concept of staging again and a "global" lock, but it has a nice property that it blocks the cleaner instead of blocking the writer, at least in the sense that the lock doesn't have to be continuously held by the cleaner. It would also be safe for concurrent writers, but the cleaner can only make progress while there are no writers. However there are some onerous requirements to make this work in practice. The whole hierarchy of objects intended to be written need to be marked as staged. It isn't safe to see that some sub-hierarchy already exists in the destination and skip staging it, because it may already be in the process of being deleted. The cleaner is required to visit/delete objects in top-down order. The cases added here are passing but more work is needed to exhaustively test this approach. Signed-off-by: J Robert Ray <jrray@jrray.org>

jrray · 2025-10-23T06:31:34Z

I made some progress with the recent update, and have been thinking about how to optimize writes when a writer now has to mark all the objects that are going to be sent over.

I'm imagining a change to the Syncer where it first sends over a list of top level object digests. The server side would mark as many of these as possible (including children of these objects), and send back a list of digests that are unknown. Then the client uses this to produce a list of the remaining digests that the server still needs. This list is sent over to complete the marking phase. Then the object/payloads are transfered.

Doing it this way allows for the possibility of not having to transfer some objects that already exist in the destination, but can recover and resend objects that aren't marked in time to prevent a concurrent clean from deleting them.

As the new test case shows, as long as the objects are written and cleaned in topological order, repo consistency can be maintained throughout.

jrray · 2025-10-23T16:33:00Z

The server side would mark as many of these as possible (including children of these objects).

Thinking about this more, since the cleaner is not allowed to create dangling pointers, then for pushing existing objects that can be marked in time, it wouldn't be necessary to mark the children. Keeping the parent alive keeps all the children alive.

jrray added the agenda item Items to be brought up at the next dev meeting label Jul 2, 2024

jrray self-assigned this Jul 2, 2024

jrray force-pushed the clean-race-loom branch from 28130af to 1ee6d24 Compare October 19, 2025 04:09

jrray added 2 commits October 22, 2025 11:33

Update other simulations to test both write cases

0badde8

The two cases that use locking continue to pass. Signed-off-by: J Robert Ray <jrray@jrray.org>

jrray force-pushed the clean-race-loom branch from 1ee6d24 to 0badde8 Compare October 22, 2025 18:40

jrray added 2 commits October 22, 2025 20:23

jrray force-pushed the clean-race-loom branch from 05b7dfb to 386dc32 Compare October 23, 2025 03:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Add loom simulations of spfs clean data race #1064

Draft: Add loom simulations of spfs clean data race #1064

Uh oh!

jrray commented Jul 2, 2024

Uh oh!

jrray commented Oct 23, 2025

Uh oh!

jrray commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Draft: Add loom simulations of spfs clean data race #1064

Are you sure you want to change the base?

Draft: Add loom simulations of spfs clean data race #1064

Uh oh!

Conversation

jrray commented Jul 2, 2024

Uh oh!

jrray commented Oct 23, 2025

Uh oh!

jrray commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants