Topic: This implementation has no locking — explore how production B-trees (e.g., InnoDB, PostgreSQL) use latching and lock coupling for concurrent access

Date: 2026-05-29

Time: 10:25

No Locking in This B-Tree — How Production Systems Do It

The Gap

This B-tree implementation (b-tree-storage-engine/btree.py) is entirely single-threaded. PageManager.readpage (line 70) and writepage (line 77) operate directly on a single file handle with no synchronization. allocatepage (line 88) reads metadata, computes the next free page, and writes it back — a classic read-modify-write sequence with no atomicity guarantee. If two threads called allocatepage concurrently, both could allocate the same page number.

The standalone write-ahead-log/wal.py does use a threading.Lock (line 80: self.lock = threading.Lock()), and every mutating method (append at line 139, appendbatch at line 150, checkpoint at line 163) acquires it. But this protects only the WAL's internal state — it doesn't coordinate B-tree structural modifications like splits.

This is fine for a teaching implementation. Production systems need a very different approach.

Latching vs. Locking

Production databases distinguish two concepts that this codebase collapses:

The B-tree in btree.py needs latches, not locks. The problem isn't transaction isolation — it's that two threads traversing the tree simultaneously could see a half-written page or a node mid-split.

Lock Coupling (Crabbing)

The naive approach — latch the entire tree for every operation — kills concurrency. Production systems use lock coupling (also called "crabbing"):

1. Latch the parent before descending.

2. Latch the child page.

3. Release the parent if the child is "safe" (won't split on insert, won't merge on delete).

4. Repeat down the tree.

Consider a lookup in this implementation's tree. BTree.get would traverse from root to leaf, calling read_page at each level (the test at line 95 asserts reads ≤ height + 1). With lock coupling, the traversal would hold at most two latches at any time — the current node and its parent — releasing the parent once the child is confirmed safe.

For writes, "safe" means the child node has room for an insertion without splitting (i.e., numkeys < maxkeysperpage). If the child might split, you must hold the parent's latch because the split will insert a new separator key into the parent.

Why This Implementation Would Break

Look at what happens during a split in btree.py. The code must:

1. Read the full node (read_page)

2. Allocate a new sibling page (allocate_page, line 88)

3. Write the split contents to both pages (write_page)

4. Insert the separator key into the parent

5. Possibly split the parent too, cascading up to the root

Without latching, a concurrent reader could see the original node after step 3 but before step 4 — the sibling exists but the parent doesn't point to it yet. Keys in the new sibling become invisible. Worse, a concurrent writer might follow a stale pointer and corrupt data.

How InnoDB Does It

InnoDB uses a combination of techniques:

How PostgreSQL Does It

PostgreSQL's nbtree uses the Lehman-Yao algorithm, which is specifically designed to reduce latching:

The next_sibling field already present in this implementation's leaf serialization is the seed of this idea, but Lehman-Yao requires a high key (the upper bound of keys in each node) and right-links on internal nodes, neither of which this implementation has.

The WAL Interaction

The WAL class in btree.py (line 122) logs page writes before applying them. In a concurrent system, the WAL and latching must coordinate carefully:

Topics to Explore

Beliefs