Topic: Compare WAL-based crash recovery with shadow paging (copy-on-write) as used in LMDB and SQLite's legacy rollback journal

Date: 2026-05-29

Time: 10:57

WAL-Based Crash Recovery vs. Shadow Paging

What the codebase shows — and what it doesn't

This repo implements WAL-based crash recovery in three different storage engines. It does not implement shadow paging. The grep for shadow, copy.on.write, cow, and rollback found zero relevant hits. So this explanation covers WAL in depth from the code, and contrasts with shadow paging conceptually based on how LMDB and SQLite's rollback journal work.

Shadow Paging: The Alternative (Not Implemented)

Shadow paging takes the opposite approach: instead of logging what you intend to change, you write changes to new pages and atomically swap a root pointer.

How LMDB does it: LMDB maintains two copies of the root (meta) page. A write transaction copies every B-tree page it modifies to a new location (copy-on-write), building a new tree that shares unmodified pages with the old one. When done, it writes the new root to the *inactive* meta page and fsyncs. The root pointer swap is atomic because a single meta page write either completes or doesn't — there's no intermediate state. Recovery is trivial: read both meta pages, use the one with the higher transaction ID that has a valid checksum.

How SQLite's rollback journal does it: Before modifying a page in the database file, SQLite copies the *original* page to a rollback journal. If the transaction commits, the journal is deleted. If the process crashes, the journal contains the pre-modification pages needed to restore the database to its last consistent state. This is shadow paging in reverse — the "shadow" is the old state, not the new.

Why the distinction matters

| Property | WAL (this codebase) | Shadow Paging (LMDB-style) |

|----------|---------------------|---------------------------|

| Write amplification | Log entry + eventual data write | Copy every modified page + ancestors to root |

| Read path during recovery | Must replay log sequentially | Just read the correct root — instant |

| Concurrent readers | Need coordination (or MVCC) | Readers see a consistent snapshot for free (old root) |

| Sequential I/O | WAL appends are sequential; excellent for HDDs | Page copies scatter across the file |

| Space overhead | Log grows until truncated/checkpointed | Must keep free pages for shadows; fragmentation risk |

| Recovery time | Proportional to log length since last checkpoint | Constant — just pick the valid root |

The B-tree WAL in this codebase (btree.py:130-185) is the closest analog to compare against LMDB. Both operate on fixed-size pages and both use checksums for integrity. But the B-tree WAL writes pages to a log first and then copies them to the data file, while LMDB writes modified pages to *new locations in the data file itself* and swaps the root pointer. LMDB never overwrites live data, which means it never needs a recovery phase — but it does need a free-page manager to reclaim shadow pages after all readers using the old root have finished.

---

What's Missing from This Codebase

To do a true side-by-side comparison, the repo would need:

The existing PageManager.free_page() method (btree.py:97-102) hints at the free-list machinery that shadow paging requires, but it's used for B-tree node reuse, not copy-on-write.

---

Topics to Explore

Beliefs