Skip to content

Conversation

@nickva
Copy link
Contributor

@nickva nickva commented Dec 23, 2025

Previous PR [1] failed to account for shards db itself. Shards db (_dbs) is managed differently than regular shard copies. Its mem3:shards(Dbs) result is a single element shard list with a #shard{} having node = node() and range = [0, ff..]. They are replicated in a ring across all nodes, we expect to find a purge checkpoint pushing changes to the "next" node in a ring only.

[1] #5827

Previous PR [1] failed to account for shards db itself. Shards db (`_dbs`) is
managed differently than regular shard copies. Its `mem3:shards(Dbs)` result is
a single element shard list with a `#shard{}` having `node = node()` and `range
= [0, ff..]`. They are replicated in a ring across all nodes, we expect to find
a purge checkpoint pushing changes to the "next" node in a ring only.

[1] #5827
Copy link
Contributor

@jiahuili430 jiahuili430 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@nickva nickva merged commit 2c92efd into main Dec 23, 2025
49 checks passed
@nickva nickva deleted the handle-dbs-specially-for-purge-checkpoints branch December 23, 2025 19:11
nickva added a commit that referenced this pull request Dec 24, 2025
In the previous PR [1] we added special handling for shards dbs since it has a
custom ring replication topology. In PR [1] we considered only the checkpoint
for pushes from the current node to the "next" none in the ring. However, we
should also consider the checkpoints created by the "previous" node when it
pull purges from current node, so that what we fix in this PR.

As a reminder, a replication job from node A to node B will:

  1) pull purges from B to A (checkpoint on B with a B->A purge checkpoint doc)
  2) push purges from A to B (checkpoint on A with a A->B purge checkpoint doc)
  3) push do updates from A to B (checkpoint on A with A->B sync checkpoint doc, and
      on B also with an A->B sync checkpoint doc)

[1] #5832
@nickva
Copy link
Contributor Author

nickva commented Dec 24, 2025

There is continuation PR after this which adds handling for "previous" purge checkpoints for when the previous nodes in the ring replicates to us. When it pulls purges from "us", it also create its own checkpoint going from current node to the previous node.

I added the pervious node consideration in a new PR #5833

nickva added a commit that referenced this pull request Dec 25, 2025
In the previous PR [1] we added special handling for shards dbs since it has a
custom ring replication topology. In PR [1] we considered only the checkpoint
for pushes from the current node to the "next" none in the ring. However, we
should also consider the checkpoints created by the "previous" node when it
pull purges from current node, so that what we fix in this PR.

As a reminder, a replication job from node A to node B will:

  1) pull purges from B to A (checkpoint on B with a B->A purge checkpoint doc)
  2) push purges from A to B (checkpoint on A with a A->B purge checkpoint doc)
  3) push do updates from A to B (checkpoint on A with A->B sync checkpoint doc, and
      on B also with an A->B sync checkpoint doc)

[1] #5832
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants