Topic: DDIA Chapter 6 covers the theory behind partition strategies, rebalancing, and request routing that this code implements

Date: 2026-05-29

Time: 14:00

DDIA Chapter 6: Partitioning — From Theory to Code

Chapter 6 of *Designing Data-Intensive Applications* lays out three core problems: how to partition data, how to rebalance when nodes change, and how to route requests to the right partition. This codebase implements all three, each in a separate module that isolates one strategy.

Rebalancing

Dynamic Splitting and Merging

Range partitioning uses dynamic rebalancing via split and merge — the strategy Kleppmann associates with HBase and RethinkDB. When a partition exceeds maxpartitionsize, put automatically splits it at the median key (lines 118–121 of rangepartitioning.py). The Partition.split method (lines 69–79) creates a new right partition and adjusts boundaries. Conversely, mergesmall_partitions (lines 152–166) recombines adjacent undersized partitions.

Consistent Hashing Rebalancing

When a node is added or removed from the hash ring, addnode (lines 27–46) and removenode (lines 48–66) return transfer maps: {(arcstart, arcend): (fromnode, tonode)}. This models what Kleppmann describes as the key advantage of consistent hashing — only keys in the affected arcs move, not the entire dataset. The weight parameter (line 27) supports heterogeneous nodes by scaling virtual node count.

Consumer Group Rebalancing

partitioned-log/partitioned_log.py implements Kafka-style consumer group rebalancing (line 317: rebalance). When consumers join or leave a group (lines 305, 309), partitions are redistributed. The Producer class (lines 144–175) handles the routing side: keyed messages hash to a fixed partition (line 161), while unkeyed messages round-robin (lines 163–165) — exactly the two strategies Kleppmann describes for Kafka producers.

---

Request Routing

The code demonstrates what Kleppmann calls the partition-aware client approach. There's no separate routing tier; instead, the store itself maintains routing metadata:

The getnodes method in consistent hashing (lines 86–101) extends routing to support replication: it walks the ring collecting replicationfactor distinct physical nodes, implementing the preference list concept from Dynamo that Kleppmann describes.

---

Topics to Explore

Beliefs