File: b-tree-storage-engine/test_btree.py

Date: 2026-05-28

Time: 18:28

b-tree-storage-engine/test_btree.py

Purpose

This is the test suite for a disk-backed B-Tree storage engine — one of the core data structure implementations from *Designing Data-Intensive Applications*. It validates the B-Tree as a complete key-value store with on-disk persistence, write-ahead logging, and crash recovery. The file owns correctness verification across the full surface area: CRUD operations, range queries, page-level I/O behavior, durability guarantees, and data integrity (CRC32 checksums).

Key Components

The suite is 11 standalone test functions, no class hierarchy — each creates a fresh BTree in a tempfile.TemporaryDirectory so tests are fully isolated.

Core functionality tests

Durability and crash recovery tests

Structural integrity tests

Patterns

Temp-directory isolation. Every test uses with tempfile.TemporaryDirectory() as tmpdir — the B-Tree writes real files to disk, so this gives each test a clean filesystem namespace with automatic cleanup.

Close-reopen persistence checks. Several tests (testbasic, testwalrecovery, testmetadataconsistencyafter_split) close the tree and reopen it from the same directory to verify durability. This is the signature pattern for testing storage engines.

Internal-API probing. Tests like testpageiocounts reach into tree.pm.pagesread, and testmetadataconsistencyaftersplit calls tree.readmeta() and pagetype(). These aren't black-box tests — they verify internal invariants that a pure API test can't catch.

Lazy imports. Several tests import B-Tree internals (WAL, serializeleaf, pagetype, INTERNAL, HEADER_FMT, LEAF) inside the function body rather than at module scope. This keeps the top-level import surface minimal and makes it clear which tests depend on internals.

maxkeysper_page=4. Nearly every test uses this small fanout to force splits with few keys, making tree height and structure predictable for assertions.

Dependencies

Imports:

Imported by: Nothing — this is a leaf test module. The companion file testertestbtree.py likely wraps or re-runs these tests in a harness.

Flow

Each test follows the same arc:

1. Create a temp directory

2. Instantiate BTree(tmpdir, maxkeysper_page=4) — this either creates fresh files or opens existing ones

3. Perform writes (put), reads (get), deletes (delete), or scans (range_scan, iteration)

4. Assert correctness of results, stats, or internal state

5. Optionally close and reopen to test persistence

6. tree.close() (or simulate crash by skipping it)

7. Print "test_X PASSED" — the tests use print-based reporting, not a framework like pytest

When run as _main_, all 11 tests execute sequentially, and a final "All tests passed!" confirms the full suite.

Invariants

Error Handling

The tests themselves don't handle errors — they assert expected outcomes and let AssertionError propagate on failure. The one exception is testtoolargekv, which uses a try/except to verify that BTree.put raises ValueError for oversized values. WAL corruption (testcrc32detectscorruption) verifies that the *implementation* handles errors silently — a corrupted entry is skipped, not propagated to the caller.

Topics to Explore

Beliefs