Date: 2026-05-28
Time: 18:28
b-tree-storage-engine/test_btree.pyThis is the test suite for a disk-backed B-Tree storage engine — one of the core data structure implementations from *Designing Data-Intensive Applications*. It validates the B-Tree as a complete key-value store with on-disk persistence, write-ahead logging, and crash recovery. The file owns correctness verification across the full surface area: CRUD operations, range queries, page-level I/O behavior, durability guarantees, and data integrity (CRC32 checksums).
The suite is 11 standalone test functions, no class hierarchy — each creates a fresh BTree in a tempfile.TemporaryDirectory so tests are fully isolated.
testbasic — The integration smoke test. Exercises put, get, contains_, range scan, iteration order, min/max key, update-in-place, delete, stats, and persistence across close/reopen. This single test covers more surface area than most of the others combined.testlarge — Inserts 1000 keys with zero-padded names (key00000–key_00999) and verifies all retrievals, sorted iteration, and that the tree has grown beyond height 1. Confirms the B-Tree scales past trivial sizes.testrangescan — Tests bounded (k05 to k10, exclusive end) and unbounded (k15 to end) range scans. Validates that range boundaries are half-open: start-inclusive, end-exclusive.testdeleteand_reinsert — Deletes every even-indexed key, reinserts them with new values, and verifies both the reinserted and untouched keys. Confirms the tree handles tombstone-to-live transitions correctly.testwalrecovery — Simulates a crash by closing raw file handles (tree.pm.f, tree.wal.f) without calling tree.close(). Reopens and verifies all 10 keys survived. This tests that the WAL replays uncommitted page writes on startup.testwaluncommittedentries — Goes further: after a clean close, manually appends a WAL entry (via WAL.logwrite) *without* a commit marker, then reopens. Verifies the tree still loads correctly — uncommitted WAL entries are replayed but shouldn't corrupt committed state.testcrc32detects_corruption — Writes a WAL entry, then flips a byte inside it to corrupt the CRC. On reopen, the corrupted entry must be silently skipped while previously committed data remains intact.testpageio_counts — After inserting 100 keys, asserts that a single get reads at most height + 1 pages. Validates that lookups follow the B-Tree's O(log n) page-read guarantee, not scanning.testtoolargekv — Confirms that inserting a value larger than the page can hold raises ValueError. The tree uses pagesize=128 and a 200-byte value to trigger this.testdeletefreesemptyleaf — Deletes keys from a non-root leaf and verifies the tree stays consistent: sorted iteration works, range scans skip deleted keys, and the gap doesn't break traversal.testmetadataconsistencyaftersplit — After 20 inserts (forcing multiple splits), reads the metadata page directly (tree.readmeta()) and asserts the root pointer references a valid INTERNAL node. Reopens and verifies metadata is identical — the root pointer, height, and key count survive persistence.Temp-directory isolation. Every test uses with tempfile.TemporaryDirectory() as tmpdir — the B-Tree writes real files to disk, so this gives each test a clean filesystem namespace with automatic cleanup.
Close-reopen persistence checks. Several tests (testbasic, testwalrecovery, testmetadataconsistencyafter_split) close the tree and reopen it from the same directory to verify durability. This is the signature pattern for testing storage engines.
Internal-API probing. Tests like testpageiocounts reach into tree.pm.pagesread, and testmetadataconsistencyaftersplit calls tree.readmeta() and pagetype(). These aren't black-box tests — they verify internal invariants that a pure API test can't catch.
Lazy imports. Several tests import B-Tree internals (WAL, serializeleaf, pagetype, INTERNAL, HEADER_FMT, LEAF) inside the function body rather than at module scope. This keeps the top-level import surface minimal and makes it clear which tests depend on internals.
maxkeysper_page=4. Nearly every test uses this small fanout to force splits with few keys, making tree height and structure predictable for assertions.
Imports:
tempfile, os, struct — standard library, for test scaffolding and low-level WAL manipulationbtree.BTree — the system under testbtree.WAL, btree.serializeleaf, btree.HEADERFMT, btree.LEAF, btree.page_type, btree.INTERNAL — internal APIs used only in durability/corruption testsImported by: Nothing — this is a leaf test module. The companion file testertestbtree.py likely wraps or re-runs these tests in a harness.
Each test follows the same arc:
1. Create a temp directory
2. Instantiate BTree(tmpdir, maxkeysper_page=4) — this either creates fresh files or opens existing ones
3. Perform writes (put), reads (get), deletes (delete), or scans (range_scan, iteration)
4. Assert correctness of results, stats, or internal state
5. Optionally close and reopen to test persistence
6. tree.close() (or simulate crash by skipping it)
7. Print "test_X PASSED" — the tests use print-based reporting, not a framework like pytest
When run as _main_, all 11 tests execute sequentially, and a final "All tests passed!" confirms the full suite.
allkeys == sorted(allkeys) — the B-Tree must maintain key order.rangescan(start, end) is start-inclusive, end-exclusive — testrange_scan asserts "k05" is included but "k10" is not.height + 1 pages (one per tree level plus possibly the data page).page_size must raise ValueError, never silently corrupt.readmeta() must return accurate root page, height, and key count, and these must survive close/reopen.The tests themselves don't handle errors — they assert expected outcomes and let AssertionError propagate on failure. The one exception is testtoolargekv, which uses a try/except to verify that BTree.put raises ValueError for oversized values. WAL corruption (testcrc32detectscorruption) verifies that the *implementation* handles errors silently — a corrupted entry is skipped, not propagated to the caller.
b-tree-storage-engine/btree.py — The implementation being tested: page manager, WAL, split logic, and the serialization format (serializeleaf, HEADER_FMT)b-tree-storage-engine/btree.py:BTree.range_scan — How half-open range iteration is implemented across leaf pagesb-tree-storage-engine/btree.py:WAL.logwrite — WAL entry format, CRC32 checksumming, and the commit protocol that testwaluncommittedentries probesb-tree-storage-engine/testertestbtree.py — The test harness wrapper; understanding how it relates to this file clarifies the project's test infrastructurebtree-split-and-merge-strategy — How the B-Tree handles node splits (tested by testmetadataconsistencyaftersplit) and whether it implements merge-on-delete or lazy tombstonesbtree-range-scan-half-open — BTree.range_scan(start, end) uses half-open intervals: start-inclusive, end-exclusive; passing no end scans to the last keybtree-max-keys-forces-splits — With maxkeysper_page=4, inserting 5+ keys forces at least one split, producing a tree of height >= 2btree-wal-replays-without-commit — The WAL replays all valid entries on recovery regardless of whether a commit marker is presentbtree-crc32-skips-corrupt-entries — Corrupted WAL entries (bad CRC) are silently skipped during recovery rather than raising an errorbtree-lookup-reads-bounded-by-height — A single BTree.get reads at most height + 1 disk pages