Topic: These are state-based (CvRDTs); delta-state and operation-based (CmRDTs) variants trade different costs — worth understanding the tradeoffs discussed in DDIA

Date: 2026-05-29

Time: 14:04

State-Based CRDTs (CvRDTs) in This Implementation

Every CRDT in conflict-free-replicated-data-types/crdts.py follows the same pattern: each replica maintains its full local state, and synchronization happens by shipping that entire state to peers via a merge() method. This is the defining characteristic of convergent replicated data types (CvRDTs) — also called state-based CRDTs.

How State-Based Replication Works Here

Look at GCounter.merge() (line 24–27):


def merge(self, other):
    for rid, count in other.counts.items():
        self.counts[rid] = max(self.counts.get(rid, 0), count)
    return self

The entire counts dictionary is the state. To sync, one replica sends its full counts map to another, and the receiver takes the element-wise max. This works because max forms a join semilattice — it's commutative, associative, and idempotent. The tests in test_crdts.py lines 30–50 (TestGCounterProperties) explicitly verify all three properties.

The same pattern repeats for every type:

Every merge() mutates self and returns self, making it easy to chain. Every type also exposes a state() method that serializes the full replica state — this is what would go over the wire in a real deployment.

What This Approach Costs (and What the Alternatives Trade)

DDIA discusses three CRDT dissemination strategies. Here's how they compare:

| | State-based (CvRDT) — this code | Operation-based (CmRDT) | Delta-state |

|---|---|---|---|

| What's sent | Full state on every sync | Individual operations (e.g., "increment by 3") | Only the state *difference* since last sync |

| Network cost | O(state size) per sync — GCounter sends all replica counts even if only one changed | O(operation size) — just the op | O(delta size) — between the two extremes |

| Delivery requirements | None — merge is idempotent, redelivery is fine | Exactly-once, causally ordered delivery | At-least-once with causal metadata |

| Implementation complexity | Low — just implement merge() | Higher — need reliable causal broadcast middleware | Medium — need delta tracking and anti-entropy |

| Metadata growth | Visible here: ORSet._tombstones (line 142) grows monotonically and is never compacted | Same problem, but can sometimes be GC'd with causal stability | Can compact deltas after acknowledgment |

The ORSet implementation makes the state-based cost concrete. Every remove() (line 155–160) moves tags into _tombstones, and merge() (line 168) unions tombstones from both sides. Tombstones never shrink. In a production system with frequent add/remove churn, this set grows without bound. An operation-based OR-Set avoids this because the "remove" operation only needs to reference the specific tags being removed — no persistent tombstone set needed — but it requires the network layer to guarantee causal delivery.

Why This Implementation Chose State-Based

State-based CRDTs are the right choice for a reference implementation because they're self-contained. The CRDTReplicaGroup helper (imported in tests, line 6) can simulate replication by just calling merge() between in-memory objects — no message broker, no causal delivery layer, no sequence numbers. The tests at lines 83–89 (TestPNCounterMerge) show how simple this makes verification: create replicas, mutate them independently, call sync_all(), assert convergence.

An operation-based version would need infrastructure (reliable broadcast, duplicate detection, causal ordering) that would obscure the CRDT logic itself. Delta-state would be a reasonable middle ground but adds bookkeeping for tracking what each peer has already seen.

Topics to Explore

Beliefs