Date: 2026-05-29
Time: 13:13
A strict quorum guarantees that any read overlaps with at least one node that saw the latest write (because W + R > N). A sloppy quorum breaks this guarantee by counting *hint storage on non-preferred nodes* toward the write quorum — nodes that reads will never consult.
In hinted-handoff/hinted_handoff.py:139-141, the coordinator calculates whether a write "succeeded":
total_acks = len(replicas_written) + len(hints_stored)
success = total_acks >= self.write_quorum
This is the critical line. A write with write_quorum=2 can succeed with 1 real replica write and 1 hint stored on a substitute node. The write is *durable* (it exists on disk somewhere), but not on the nodes that readers will query.
In hinted-handoff/hinted_handoff.py:148-161, reads only consult preferred replicas:
def get(self, key: str) -> dict:
preferred = self.get_preferred_nodes(key)
results = []
for nid in preferred:
node = self.nodes[nid]
if node.is_available():
result = node.get(key)
...
The read never looks at non-preferred nodes. If a write was acknowledged partly through hints on substitute nodes, the read won't find the latest value until triggerhandoff (hintedhandoff.py:170) delivers those hints back to the recovered preferred replica.
Consider this scenario with writequorum=2, replicationfactor=3, preferred nodes [A, B, C]:
1. Nodes A and B go down.
2. A write succeeds: C gets the real write, hints are stored on D and E (non-preferred) for A and B. total_acks = 1 + 2 = 3 >= 2. Success.
3. A read comes in. It queries preferred nodes [A, B, C]. A and B are still down, so only C responds. If readquorum=2, the read *fails entirely*. If readquorum=1, C returns the value — but only because C happened to be available. If C were the one that was down instead, the write would have succeeded via hints but the read would return nothing.
Test testsloppyquorumsucceeds (testhinted_handoff.py:108-117) exercises exactly this — two preferred nodes down, write succeeds via hints, but no test follows up with a read to show the value is invisible. That's the gap DDIA warns about.
The leaderless-replication/dynamo.py implementation takes a different approach. At line 108, hints are only stored *after* the write quorum is already met on real nodes:
if self.sloppy_quorum:
unavailable_ids = [nid for nid, n in self.nodes.items() if not n.is_available]
available_nodes = [n for n in self.nodes.values() if n.is_available]
for target_id in unavailable_ids:
if available_nodes:
available_nodes[0].add_hint(key, value, version, target_id)
Here hints are a *bonus* for durability — the quorum was already met by real writes. This is closer to a strict quorum with opportunistic hints, not a true sloppy quorum. The HintedHandoffStore implementation is the one that genuinely counts hints toward the quorum threshold.
triggerhandoff at hintedhandoff.py:170-193 is what eventually restores consistency. It iterates all nodes, finds hints targeting the recovered node, and replays them:
recovered_node.put(hint.key, hint.value, hint.version)
Until this runs, the "successfully written" data is stranded on non-preferred nodes that no reader will ever consult. The test at testhintedhandoff.py:55-67 (testhandoffdelivers_hints) confirms that the recovered node only has the data *after* handoff, not before.
Even the durability guarantee is bounded. Hints have a TTL (hintedhandoff.py:19-21), and expirehints removes them. If a node stays down longer than hintttl, the hints expire and the data is lost from those replicas permanently — only the nodes that received real writes still have it. Test testhintexpiry (testhintedhandoff.py:96-106) validates this with hintttl=10.
hinted-handoff/hinted_handoff.py:HintedHandoffStore.put — Trace exactly which code paths count hints toward the quorum vs. which require real replica writesread-repair/read_repair.py — Read repair is the complementary mechanism that fixes staleness on *available* replicas during reads; compare how it restores consistency opportunistically vs. handoff doing it on recoveryleaderless-replication/dynamo.py:DynamoCluster.antientropyrepair — The third consistency-restoration mechanism (background full sync), and how it differs from both read repair and hinted handoffsloppy-vs-strict-quorum-test-gap — The test suite for HintedHandoffStore doesn't test reading after a sloppy write but *before* handoff — writing this test would concretely demonstrate the visibility window DDIA describesleaderless-replication/testdynamotester.py — Contains testspecsloppyquorumexample and testnohintswithoutsloppy_quorum, showing how the DynamoCluster's more conservative hint strategy behaves differentlysloppy-quorum-counts-hints-as-acks — HintedHandoffStore.put adds len(replicaswritten) + len(hintsstored) and compares against write_quorum, meaning a write can succeed with zero real replica writes if enough hints are storedreads-never-consult-hint-nodes — HintedHandoffStore.get only queries nodes returned by getpreferrednodes, so data stored only as hints on substitute nodes is invisible to readers until handoff completesdynamo-cluster-hints-after-quorum — DynamoCluster.put only stores hints after the write quorum is already met on real nodes (line 107-113), unlike HintedHandoffStore which counts hints toward the quorumhint-expiry-bounds-durability — Hints in HintedHandoffStore have a TTL; if triggerhandoff doesn't run before createdat + ttl, the hint is silently dropped by expire_hints, permanently losing that replica's copy