Skip to content

Immutability and infinite undo/redo stack #1

@rebornix

Description

@rebornix

We use a regular red black tree in our implementation. It's fast to insert/delete content and search speed in the tree is reasonable (O(n)). In overall we are satisfied.

However, our current implementation is not 100% immutable. The underlining string buffers are immutable (original ones) and append-only (new ones) but the tree nodes are not. We will do tree node zipping when necessary to avoid creating too many tree nodes. This leads challenges to Edit History (undo/redo stack) and Snapshots:

  • We have to track both range and edited content in undo/redo stack and even worse when we are traversing in the stack, we keep inserting redundant nodes into the tree, simply because the tree doesn't support undo/redo.
  • When we want to save the text buffer onto disk, we have to copy the whole tree. Again, it's because the tree nodes are mutable.

It's not a big deal when working on relatively small files but it might suffer when the file size grows significantly. Imaging you did a replace all in a Gig Byte file and found yourself make a typo, you press Cmd+Z and wish the text buffer can revert back to its initial state in the blink of an eye, but the editor hangs for a few seconds as the text buffer has to apply the reversed replace operations on the rb tree.

To mitigate this issue, we may want to introduce builtin undo/redo in the text buffer, which involves two steps

Make tree nodes immutable

The idea is we no longer modify tree nodes and whenever we are making edits, we replace the tree node and its all ancestors with new ones.

        A
      /   \
    B      C
  /         \
D            E

Say we are inserting a character at offset 0, it will insert a new node F before D. In current implementation, the tree will now look like below (before rotation fix)

          A
        /   \
      B      C
    /         \
  D            E
 /
F

Then we replace F's ancestors with their clones. After the replacement, there are now two roots

          A'      A
        /   \   /   \
      B'      C      B
    /         |       \
  D'          E        D
 /
F

Ok, sorry I lied. The tree doesn't turn to above graph as every tree node has an evil property parent. If we replace all F's ancestors with their clones, we need to modify C as well since its parent pointer needs to be updated. In an immutable world, it means C needs to replaced its clone C', which pointing to A'. After all fixups, we basically cloned the whole tree. That would be the worst immutable data structure.

The fix is straight forward, we no longer maintain parent pointer in every node. It makes most tree operations a bit complex as from now on, when we are traversing in the tree, we need to maintain an iterator from the root to the active node. It can be stored in an array or a stack.

Once we make that happen. We now use A' as the new root node.

          A'      A
        /   \   /   \
      B'      C      B
    /         |       \
  D'          E        D
 /
F

Undo/Redo stack

The undo redo stack is pretty close to [A, A']. When users press cmd+z, we change the root node from A' to A. We may want to store edit range in the stack otherwise we need to compare two trees A and A'. The edit range can be two digits [offset, len].

Taking snapshots now comes at not cost. We only need to do an in order traverse for node A. As every node is immutable, we no longer need to copy the tree.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedIssues identified as good community contribution opportunities

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions