File: range-partitioning/range_partitioning.py

Date: 2026-05-29

Time: 09:03

Purpose

This file implements range-based partitioning — the technique described in DDIA Chapter 6 where a keyspace is divided into contiguous, non-overlapping ranges, each assigned to a partition. It owns two responsibilities: storing sorted key-value data within individual partitions (Partition), and routing operations to the correct partition plus managing partition lifecycle — splits and merges — (RangePartitionedStore).

This is the counterpart to the consistent-hashing implementation in the same repo. Where hash partitioning distributes keys pseudo-randomly for uniform load, range partitioning preserves key ordering, which makes range scans efficient but creates hot-spot risk on sequential writes.

Key Components

Partition

A single shard holding key-value pairs in sorted order within a half-open range [startkey, endkey). end_key=None means unbounded to the right.

PartitionInfo

A read-only dataclass snapshot of partition metadata — used by getpartitioninfo() and getpartitionfor_key() to expose partition state without leaking mutable internals.

RangePartitionedStore

The coordinator that maintains the partition list and routes operations.

Patterns

Parallel arrays for routing. partitions and boundaries are kept in sync — index i in both refers to the same partition. This is a simple alternative to a balanced tree of partition metadata; it trades O(n) insert/delete on the lists for O(1) indexed access and O(log n) binary search routing.

Median-index split. The split point is the key at position len // 2, not the lexicographic midpoint of the range. This produces roughly equal-sized partitions by count but can produce unequal ranges if keys are clustered.

Half-open intervals [start, end). The last partition uses end_key=None as an open upper bound. Every key belongs to exactly one partition — no gaps, no overlaps.

Automatic split, manual merge. Splits are triggered inline during put(), but merges require an explicit call to mergesmallpartitions(). This asymmetry is typical in real systems (e.g., HBase, CockroachDB) where splits are urgent (a partition is too large to serve) but merges can be deferred to a background process.

Dependencies

Imports: Standard library only — bisect for binary search, uuid for partition IDs, dataclass and typing for structure.

Imported by: testrangepartitioning.py and testertestrange_partitioning.py — the test suite and its validation harness.

Flow

A typical write path:

1. RangePartitionedStore.put("customer:42", data) is called.

2. findpartitionindex does bisectright(self._boundaries, "customer:42") - 1 to find partition index.

3. The partition's put() does bisect_left to find the sorted insertion point, inserts or overwrites.

4. If the partition now exceeds maxpartitionsize, split() is called — the partition divides its data at the median, a new right partition is created, and partitions/boundaries are updated.

A typical range scan:

1. range_scan("customer:10", "customer:50") identifies the start and end partition indices.

2. Iterates partitions from start to end index, calling each partition's range_scan with the same bounds.

3. Each partition does two bisect_left calls to slice its sorted lists, returns the matching pairs.

4. Results are concatenated in partition order, producing a globally sorted result.

Invariants

Error Handling

Minimal. The code does not validate inputs or guard against misuse:

Topics to Explore

Beliefs