Date: 2026-05-29
Time: 09:03
conflict-free-replicated-data-types/crdts.pyThis file implements four state-based CRDTs (Conflict-Free Replicated Data Types) — the core data structures from DDIA Chapter 5 that allow multiple replicas to update independently and converge to the same state without coordination. It also provides CRDTReplicaGroup, a test harness that simulates a cluster of replicas with network sync.
Each CRDT satisfies the mathematical requirement for convergence: the merge operation is commutative, associative, and idempotent, meaning replicas can sync in any order, any number of times, and always reach the same result.
GCounter — Grow-Only CounterA counter that can only increase. Each replica maintains its own slot in a {replica_id: count} dictionary. value() sums all slots. Merge takes the element-wise max — this works because counts are monotonically increasing per replica, so max always preserves the highest observed value without double-counting.
PNCounter — Positive-Negative CounterSupports both increment and decrement by composing two GCounter instances: p for positive increments, n for negative (decrements). The value is p.value() - n.value(). This is a textbook example of building richer CRDTs by composing simpler ones — you can't subtract from a G-Counter, but you can track subtractions in a separate G-Counter.
LWWRegister — Last-Writer-Wins RegisterA single-value register where conflicts are resolved by timestamp, with replicaid as a deterministic tiebreaker (lexicographic comparison via Python's tuple ordering). Has an internal auto-incrementing clock so callers don't need to supply timestamps manually. When merging, the clock advances to max(self.clock, other._clock) to avoid future collisions.
ORSet — Observed-Remove SetThe most complex CRDT here. Each add() creates a unique tag (replicaid, seqnum). remove() tombstones all currently-known tags for an element. Merge unions active tags from both sides, then subtracts the union of both tombstone sets. This gives add-wins semantics: if replica A adds element X concurrently with replica B removing it, the add survives because its tag wasn't in B's tombstone set at removal time.
CRDTReplicaGroup — Test HarnessCreates a set of replicas from any CRDT class and manages sync. sync(from, to) does a one-directional merge using deepcopy to simulate network serialization (the source is copied so mutation of the target doesn't affect the source). sync_all() runs two full rounds of all-pairs sync, which is sufficient for convergence in any state-based CRDT.
Uniform CRDT interface. Every CRDT follows the same shape: constructor takes replicaid, has a merge(other) that mutates and returns self, a state() for serialization, and eq_ for convergence checks. This lets CRDTReplicaGroup work polymorphically with any of them.
Merge returns self. All merge methods mutate in-place and return self. This is a fluent pattern that allows chaining but — more importantly — makes the merge direction unambiguous: the caller's state is updated, the argument is read-only.
Composition over inheritance. PNCounter doesn't subclass GCounter; it composes two instances. This mirrors the DDIA approach of building complex CRDTs from simple building blocks.
Deep copy for network simulation. CRDTReplicaGroup.sync uses deepcopy to prevent aliasing — without it, merging mutable sets/dicts would create shared references between replicas, hiding bugs where merge accidentally mutates the source.
Imports: Only deepcopy from the standard library — no external dependencies. This keeps the CRDT implementations self-contained and focused on the algorithms.
Imported by: testcrdts.py and testertest_crdts.py — the test suite and its meta-test validator.
A typical usage cycle:
1. Create replicas (directly or via CRDTReplicaGroup)
2. Each replica applies local mutations (increment, set, add, remove)
3. Replicas exchange state via merge (or sync/sync_all through the group)
4. After sufficient sync rounds, all replicas converge to the same value
For ORSet, the internal flow during merge is:
1. Collect all elements referenced by either side
2. Union both tombstone sets
3. For each element, union active tags from both sides, subtract the merged tombstones
4. Keep only elements with surviving tags
increment rejects negative amounts (ValueError). Per-replica counts only grow, guaranteeing the element-wise max in merge is a valid join operation.(timestamp, writer_id) tuple comparison — since replica IDs are unique, this is a total order. No two concurrent writes can result in ambiguity.(replicaid, seqnum) with a per-replica monotonic counter. After merge, seq advances to max(self.seq, other._seq) to prevent tag collisions.sync_all runs 2 full rounds of all-pairs sync. For state-based CRDTs with commutative/associative/idempotent merge, this is sufficient for convergence regardless of prior state.Minimal by design — CRDTs have narrow failure modes:
GCounter.increment raises ValueError on negative amounts — the only explicit validation.ORSet.remove returns False if the element isn't present rather than raising, which is the correct semantic (removing something you haven't seen is a no-op, not an error).conflict-free-replicated-data-types/test_crdts.py — See how convergence, concurrent operations, and edge cases are tested for each CRDT typeor-set-tombstone-growth — The tombstone set in ORSet grows without bound; production systems (e.g., Riak) use causal context or garbage collection to address thisconflict-free-replicated-data-types/crdts.py:LWWRegister.set — The auto-clock mechanism and its interaction with explicit timestamps during merge is subtle and worth tracing throughvector-clocks/vector_clock.py — Vector clocks are the causal ordering primitive that more sophisticated CRDTs build on; compare with the simpler timestamp approach in LWWRegisterdelta-state-crdts — These are state-based (CvRDTs); delta-state and operation-based (CmRDTs) variants trade different costs — worth understanding the tradeoffs discussed in DDIAcrdt-merge-is-idempotent — All four CRDT merge methods are idempotent: merging the same state twice produces the same result as merging oncepncounter-composes-gcounters — PNCounter delegates entirely to two GCounter instances; it contains no independent merge or counting logicorset-add-wins-semantics — Concurrent add and remove of the same element in ORSet resolves in favor of the add, because the add's tag is not in the remover's tombstone setlww-tiebreak-is-deterministic — LWWRegister.merge uses (timestamp, writer_id) tuple comparison, making conflict resolution a total order with no ambiguitysync-all-two-rounds-suffices — CRDTReplicaGroup.sync_all runs exactly 2 rounds of all-pairs sync, which is sufficient for convergence of any state-based CRDT with a valid merge lattice