Function: MultiLeaderCluster in multi-leader-replication/multi_leader.py

Date: 2026-05-29

Time: 14:13

MultiLeaderCluster

Purpose

MultiLeaderCluster is a simulation harness for multi-leader (multi-master) replication, the architecture described in DDIA Chapter 5 where multiple nodes independently accept writes and asynchronously replicate changes to each other. It owns a set of ReplicaNode instances, wires them together according to a chosen replication topology, and drives the sync protocol that propagates pending writes between nodes.

The class exists to make the replication topology and conflict resolution strategy configurable at the cluster level, so callers can write to any node directly and then call sync() to simulate an asynchronous replication round — without manually wiring up which node talks to which.

Contract

Preconditions:

Postconditions:

Invariants:

Parameters

| Parameter | Type | Default | Meaning |

|-----------|------|---------|---------|

| node_ids | list[str] | — | Identifiers for each replica node. Order defines the ring for RING topology. |

| strategy | ConflictStrategy | LASTWRITEWINS | How concurrent writes to the same key are resolved. LWW compares (timestamp, node_id) tuples lexicographically. |

| mergefn | Optional[Callable] | None | User-supplied merge function, required when strategy is CUSTOMMERGE. Ignored otherwise. |

| topology | Topology | ALLTOALL | Replication graph shape. ALLTOALL sends every change to every other node per round. RING sends only to the next node in the list. |

Edge cases:

Return Values

| Method | Returns | Conditions |

|--------|---------|------------|

| node(id) | ReplicaNode | Raises KeyError if id is unknown. |

| sync() | int — number of change deliveries attempted | Counts every applyremotechange call, including idempotent no-ops (where the change was already seen). |

| allconverged() | bool | True when all nodes have identical key sets and matching (value, timestamp, istombstone) tuples for every key. |

| syncuntilconverged() | int — rounds taken | Raises RuntimeError if not converged after max_rounds. |

Algorithm

sync()

1. Snapshot pending changes: Iterates nodeorder and calls getpendingchanges() on each node. This drains and returns each node's outbound log, so the same change is never sent twice across rounds.

2. Deliver based on topology:

3. The return count is the number of applyremotechange calls, not the number of *accepted* changes. Idempotent skips inside applyremotechange (via the _seen set) are still counted.

all_converged()

1. Takes the first node as the reference.

2. Checks that all other nodes have the same set of keys.

3. For each key, compares (value, timestamp, istombstone) — tuple indices (0, 1, 3). Notably skips index 2 (originnode_id), because different nodes may record different origins for the same resolved value depending on arrival order.

syncuntilconverged()

Calls sync() then allconverged() in a loop up to maxrounds times. For ALLTOALL, one round suffices if there are no custom-merge cascades. For RING with N nodes, convergence takes at least N−1 rounds because each hop only reaches the next neighbor.

Side Effects

Error Handling

Usage Patterns


cluster = MultiLeaderCluster(["dc1", "dc2", "dc3"])

# Write to different leaders concurrently
cluster.node("dc1").put("user:1", "Alice")
cluster.node("dc2").put("user:1", "Bob")  # concurrent write → conflict

# Replicate
cluster.sync()

# Check convergence
assert cluster.all_converged()

# Or just sync until done (useful for RING topology)
cluster = MultiLeaderCluster(["a", "b", "c"], topology=Topology.RING)
cluster.node("a").put("x", 1)
rounds = cluster.sync_until_converged()  # will take 2 rounds for 3 nodes

Caller obligations:

Dependencies

Topics to Explore

Beliefs