File: hinted-handoff/hinted_handoff.py

Date: 2026-05-29

Time: 08:56

Hinted Handoff — hinted_handoff.py

Purpose

This file implements hinted handoff, a technique from Dynamo-style leaderless replication systems (DDIA Chapter 5). When a write's preferred replica is temporarily unavailable, another node stores the write as a "hint" and replays it once the target recovers. This lets the system maintain write availability during partial failures without permanently re-routing data to the wrong nodes.

The file owns the full lifecycle: routing writes to preferred replicas, falling back to sloppy quorum via hints, expiring stale hints, and delivering hints when nodes recover.

Key Components

Hint

A write record destined for a node that was down at write time. It's a pure data object — target node, key/value/version, creation timestamp, and TTL. The only behavior is isexpired(), which checks currenttime >= created_at + ttl (a closed boundary — hints expire exactly at the TTL, not after).

Node

A storage node with two concerns:

Availability is a simple boolean flag toggled externally — this is a simulation, not a network layer.

HintedHandoffStore

The coordinator. It doesn't represent a single node — it's an omniscient cluster-level controller (appropriate for a teaching implementation). Key responsibilities:

1. Bump a global version counter for the key.

2. Try each preferred replica; collect successes and failures.

3. If sloppy_quorum=False and not enough preferred nodes responded, fail immediately.

4. For each unavailable preferred node, find the first available non-preferred node and store a hint there.

5. Count replicaswritten + hintsstored against write_quorum to determine success.

Patterns

Sloppy quorum — The sloppy_quorum flag controls whether hint-holding nodes count toward the write quorum. When True (default), the system trades consistency for availability: a write "succeeds" even if the data only reached non-preferred nodes as hints. When False, only actual preferred-replica writes count, and hints are stored optimistically but don't affect the success/failure decision.

Coordinator patternHintedHandoffStore is a god-object coordinator, not a peer in a distributed protocol. This is deliberate for a reference implementation — it avoids the complexity of message passing while demonstrating the algorithm clearly.

Version-gated writesNode.put() accepts a write only if version >= existing_version. This is a simplified last-writer-wins conflict resolution. The coordinator manages version assignment centrally (monotonic per key), so conflicts can't arise in the normal path — but during handoff replay, stale hints are harmlessly absorbed because their versions will be <= the current version.

Time as a parameter — All time-dependent operations (put, triggerhandoff, expireallhints) take currenttime as an explicit argument rather than calling time.time(). This makes the implementation fully deterministic and testable.

Dependencies

Imports: None — the module is entirely self-contained with no external dependencies.

Imported by:

Flow

Write path (put)


put(key, value, current_time)
  │
  ├─ bump version counter
  ├─ get_preferred_nodes(key) → [A, B, C]
  │
  ├─ for each preferred node:
  │     available? → node.put() → add to replicas_written
  │     unavailable? → add to unavailable_preferred
  │
  ├─ if strict quorum and replicas_written < write_quorum → FAIL
  │
  ├─ for each unavailable preferred node:
  │     find first available non-preferred node
  │     store Hint on that node
  │     add to hints_stored
  │
  └─ success = (replicas_written + hints_stored) >= write_quorum

Handoff path (trigger_handoff)


trigger_handoff(recovered_node_id, current_time)
  │
  ├─ for every OTHER node in the cluster:
  │     get hints targeting recovered_node_id
  │     for each hint:
  │       expired? → count as expired, skip
  │       valid? → recovered_node.put(key, value, version)
  │     remove all hints for recovered_node_id
  │
  └─ return { hints_delivered, hints_expired, keys_recovered }

Invariants

1. One hint per unavailable target per write — for each unavailable preferred node, only *one* non-preferred node stores the hint (the first available one found). This prevents hint explosion.

2. Hints live on non-preferred nodes only — the hint-storing loop explicitly excludes preferred nodes from the candidate set. A preferred node never stores hints for a peer in the same preference list.

3. Version monotonicity per keyself.versions[key] is incremented on every put() call, so versions are globally monotonic per key. This means handoff replays of older writes are safely absorbed by the version check in Node.put().

4. Reads never consult non-preferred nodesget() only queries preferred replicas. This means data stored only as hints is invisible to reads until handoff completes. This is a deliberate trade-off: sloppy quorum improves write availability but not read availability.

5. Hint cleanup is all-or-nothing per targetremovehintsfor() purges all hints for a target regardless of whether individual hints succeeded or expired. After trigger_handoff, no hints for that node remain on any other node.

Error Handling

There is essentially none, by design. The module signals write failure through the success: False return value — it never raises exceptions. Invalid node IDs would raise KeyError from the self.nodes dict, but this is an unchecked precondition rather than intentional error handling. The is_expired check silently discards stale hints during handoff rather than surfacing them as errors.

Topics to Explore

Beliefs