Topic: The tombstone set in ORSet grows without bound; production systems (e.g., Riak) use causal context or garbage collection to address this

Date: 2026-05-29

Time: 14:03

The ORSet Tombstone Growth Problem

The ORSet in conflict-free-replicated-data-types/crdts.py tracks removals by accumulating tombstones in self._tombstones (line 140), a set that only ever grows — nothing in the implementation ever removes entries from it.

How tombstones accumulate

When you call remove() (line 148), every active tag for that element moves into _tombstones:


self._tombstones.update(self._elements[element])
del self._elements[element]

When you call merge() (line 158), the two tombstone sets are unioned:


merged_tombstones = self._tombstones | other._tombstones

Every tag that was ever removed stays in _tombstones forever. If a replica adds and removes the same element 1,000 times, that's 1,000 tags permanently in the tombstone set — even though only the most recent state matters.

Why the tombstones can't simply be dropped

The tombstone set is load-bearing during merge. At line 165:


merged_tags = (self_tags | other_tags) - merged_tombstones

If replica A removes element "x" and replica B hasn't synced yet, B still has "x"'s tags in its active set. When they merge, A's tombstones are what suppress B's stale tags. Drop the tombstones too early and removed elements reappear — a "zombie resurrection" bug.

What production systems do instead

This implementation uses the "tag + tombstone" approach, which is correct but naive. Production systems avoid unbounded tombstone growth through two main strategies:

Causal context (Riak's approach). Instead of storing individual tombstones, each replica maintains a compact *version vector* (also called a "causal context" or "dot context") that summarizes all tags it has ever seen. A tag (replicaid, seq) is implicitly tombstoned if seq is at or below the version vector's entry for that replica. This compresses an arbitrary number of tombstones into a single integer per replica. The grep for VectorClock|versionvector|dot|tag returned zero matches in the CRDT module — this implementation has no such mechanism.

Garbage collection with causal stability. Once all replicas have observed a remove operation (confirmed via anti-entropy or protocol-level acknowledgment), the tombstone is "causally stable" and safe to discard. This requires distributed coordination — you need to know what every replica has seen — which is why the textbook implementation skips it.

Contrast with the LSM tree

Interestingly, the LSM tree module in this same repo (log-structured-merge-tree/lsm.py, line 320) *does* implement tombstone garbage collection: its compact() method removes tombstones during compaction. That works because the LSM tree is a single-node data structure — there's no risk of a remote replica holding a stale view that needs the tombstone to prevent resurrection. The distributed setting of CRDTs makes the same optimization fundamentally harder.

Topics to Explore

Beliefs