-
Notifications
You must be signed in to change notification settings - Fork 10
Draft: Add loom simulations of spfs clean data race #1064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
$ cargo test --bin clean-loom This shows three variations on how `spfs clean` can race with writers to a repo. The first one tries to simulate the existing behavior and the test fails as expected. The other two demonstrate how a write-ahead log might prevent races. The first one requires the write-ahead log to be locked while reading in all the tags, which in turn would block writes for a significant amount of time. The second one proposes how the write-ahead log could be pruned over time with a kakfa-like tombstone concept, and only requires the log to be locked while actually deleting files. In either case, the time while the write-ahead log is locked could early exit in order to not block writers for an extended amount of time. In this way it could make incremental progress by deleting some number of objects in each pass. Signed-off-by: J Robert Ray <jrray@jrray.org>
28130af to
1ee6d24
Compare
UPDATE: this approach still has a race but the original version wasn't testing the condition where the race exists. The test has been updated to show that a race still exists. This approach doesn't require there to be any "global" lock while performing a clean, but the trade-off is there is more overhead for writing objects and for performing a clean. In summary, when writing an object, first a <digest>.hold file will be written. The existence of this file prevents <digest> from being garbage. The <digest>.hold file is not removed until after a tag exists that creates a strong reference to the object. The clean process collects what it thinks are garbage objects as usual, but then it checks for .hold files for each candidate garbage object. Then, a second pass over all the tags is needed to discover any strong references that didn't exist in the first pass. Since looking for tags happens after checking for .hold files, the new tags are guaranteed to be found. The tag must exist by the time the .hold file is deleted. Signed-off-by: J Robert Ray <jrray@jrray.org>
The two cases that use locking continue to pass. Signed-off-by: J Robert Ray <jrray@jrray.org>
1ee6d24 to
0badde8
Compare
This approach avoids the need to hold a lock for the span of a whole clean operation, but uses a lock per object to prevent newly-written objects from being cleaned. The key is stopping a writer from writing an object just before the cleaner deletes it. A staging file is used for the writer to mark an object as non-garbage. Synchronization is still needed so either the deleter sees the staging file, or the writer waits until it is safe to write a new copy of the object. Signed-off-by: J Robert Ray <jrray@jrray.org>
This one tries to detect any deletes that produce a dangling reference. It uses the concept of staging again and a "global" lock, but it has a nice property that it blocks the cleaner instead of blocking the writer, at least in the sense that the lock doesn't have to be continuously held by the cleaner. It would also be safe for concurrent writers, but the cleaner can only make progress while there are no writers. However there are some onerous requirements to make this work in practice. The whole hierarchy of objects intended to be written need to be marked as staged. It isn't safe to see that some sub-hierarchy already exists in the destination and skip staging it, because it may already be in the process of being deleted. The cleaner is required to visit/delete objects in top-down order. The cases added here are passing but more work is needed to exhaustively test this approach. Signed-off-by: J Robert Ray <jrray@jrray.org>
05b7dfb to
386dc32
Compare
|
I made some progress with the recent update, and have been thinking about how to optimize writes when a writer now has to mark all the objects that are going to be sent over. I'm imagining a change to the Syncer where it first sends over a list of top level object digests. The server side would mark as many of these as possible (including children of these objects), and send back a list of digests that are unknown. Then the client uses this to produce a list of the remaining digests that the server still needs. This list is sent over to complete the marking phase. Then the object/payloads are transfered. Doing it this way allows for the possibility of not having to transfer some objects that already exist in the destination, but can recover and resend objects that aren't marked in time to prevent a concurrent clean from deleting them. As the new test case shows, as long as the objects are written and cleaned in topological order, repo consistency can be maintained throughout. |
Thinking about this more, since the cleaner is not allowed to create dangling pointers, then for pushing existing objects that can be marked in time, it wouldn't be necessary to mark the children. Keeping the parent alive keeps all the children alive. |
This shows three variations on how
spfs cleancan race with writers to a repo.The first one tries to simulate the existing behavior and the test fails as expected.
The other two demonstrate how a write-ahead log might prevent races. The first one requires the write-ahead log to be locked while reading in all the tags, which in turn would block writes for a significant amount of time.
The second one proposes how the write-ahead log could be pruned over time with a kakfa-like tombstone concept, and only requires the log to be locked while actually deleting files.
In either case, the time while the write-ahead log is locked could early exit in order to not block writers for an extended amount of time. In this way it could make incremental progress by deleting some number of objects in each pass.