File: consistent-hashing/consistent_hashing.py

Date: 2026-05-29

Time: 08:53

consistent-hashing/consistent_hashing.py

Purpose

This file implements a consistent hash ring — the core data partitioning primitive described in DDIA Chapter 6 (Partitioning). It owns the mapping from arbitrary string keys to physical nodes, handling the three main concerns of consistent hashing: virtual nodes for load balancing, weighted nodes for heterogeneous hardware, and replication via preference lists.

This is the kind of component that sits inside a distributed storage coordinator (like Dynamo, Cassandra, or Riak) to decide which node owns which key range.

Key Components

Constants

_hash(key: str) -> int

A free function that hashes strings to ring positions using MD5, masked to 32 bits. MD5 is not cryptographically relevant here — what matters is uniform distribution across the ring. The & 0xFFFFFFFF mask ensures the output fits within RING_SIZE.

ConsistentHashRing

The main class. Its internal state is three parallel structures:

| Field | Type | Role |

|-------|------|------|

| ringpositions | List[int] | Sorted list of virtual node hash positions |

| ringnodes | List[str] | Node ID at each position (parallel to ringpositions) |

| _nodes | Dict[str, float] | Physical node registry: node ID → weight |

Constructor parameters:

#### Core Operations

addnode(nodeid, weight=1.0) -> Dict — Inserts a physical node into the ring by generating int(numvnodes * weight) virtual nodes. Each virtual node's position is hash(f"{nodeid}:{i}"). For each inserted position, it computes the transfer: which arc of the ring is being taken from which existing node. Returns a transfer map of {(arcstart, arcend): (oldowner, new_owner)}.

removenode(nodeid) -> Dict — Removes all virtual nodes for the given physical node. Iterates indices in reverse to avoid index invalidation during deletion. For each removed position, finds the successor node that inherits the arc. Returns the same transfer map format.

get_node(key) -> str — The primary lookup. Hashes the key, then uses bisect.bisect to find the first ring position ≥ the hash. If the hash exceeds all positions, it wraps to index 0. This is the classic clockwise-walk on the ring.

getnodes(key) -> List[str] — Builds a preference list of replicationfactor distinct physical nodes by walking clockwise from the key's position, skipping duplicate physical nodes (since consecutive virtual nodes might belong to the same physical node). This is exactly how Dynamo constructs its coordinator list.

#### Analytical Methods

Patterns

1. Sorted array + bisect — Rather than a balanced BST or skip list, the ring uses bisect on a sorted Python list for O(log n) lookups and O(n) inserts. This is the right tradeoff for a ring that changes rarely (node add/remove) but is queried constantly (key lookups).

2. Parallel arraysringpositions and ringnodes are kept in lockstep rather than using a list of tuples. This allows bisect to work directly on positions without a custom comparator.

3. Virtual nodes with deterministic naming — Virtual node positions are hash(f"{nodeid}:{i}"), making them reproducible. The same node added on different machines produces the same ring positions — important for consistency across replicas of the routing table.

4. Transfer maps — Both addnode and removenode return structured descriptions of what data must move, rather than performing the movement themselves. This separates the partitioning decision from the data migration concern.

Dependencies

Imports:

Imported by:

No runtime dependencies beyond the standard library.

Flow

Adding a node and looking up a key


add_node("A", weight=1.0)
  → generates 150 positions via _hash("A:0"), _hash("A:1"), ...
  → for each position, bisect_left finds insertion point
  → records which existing node loses that arc
  → inserts into _ring_positions and _ring_nodes

get_node("user:123")
  → _hash("user:123") → position P
  → bisect(positions, P) → index of first position ≥ P
  → wrap to 0 if past end
  → return _ring_nodes[index]

Preference list construction (get_nodes)


get_nodes("user:123")
  → find starting index via bisect
  → walk clockwise through _ring_nodes
  → collect nodes into result, skipping duplicates
  → stop when len(result) == replication_factor

The walk can traverse up to len(ringpositions) entries in the worst case (when most virtual nodes belong to a small number of physical nodes), but terminates early once enough distinct nodes are found.

Invariants

1. ringpositions is always sorted — enforced by bisect.insort-style insertion (bisect_left + insert).

2. ringpositions and ringnodes are always the same length — every insert/remove touches both.

3. Each physical node gets int(num_vnodes * weight) virtual nodes — weight < 1.0 reduces presence on the ring.

4. getnodes always returns exactly replicationfactor distinct physical nodes — guaranteed by the precondition check that len(self.nodes) >= self.replicationfactor.

5. Adding the same node twice is a no-op — the early return if nodeid in self.nodes: return {} prevents duplicate virtual nodes.

6. Ring positions wrap at RINGSIZE (2^32) — arc computations and the getnode wrap-around handle this explicitly.

Error Handling

Errors are raised eagerly at the API boundary:

No try/except blocks exist in this module — it raises on invalid state and trusts callers to provide valid inputs.

Topics to Explore

Beliefs