Date: 2026-05-29
Time: 11:39
Now I have a thorough understanding. Here's the overview:
write-ahead-log/wal.py — The foundational crash-recovery primitive; its binary record format, fsync modes, and truncation semantics establish patterns reused by the B-tree and Bitcask modulesunbundled-database/unbundled_database.py — The capstone module composing CDC, log-based messaging, and secondary indexes into a unified data system, reflecting DDIA Chapter 12's thesis about composing derived datab-tree-storage-engine/btree.py:_delete — The delete path's three-valued return (False/True/'empty') and its incomplete cleanup (only depth-2 parents, never leftmost child) reveals real-world tradeoffs in B-tree deletioncrash-safety-gaps — Multiple modules have documented crash-safety gaps (Bitcask compaction, B-tree page allocation, LSM compaction) where multi-step file mutations lack atomicity; comparing these reveals a consistent design choice to favor simplicity over production-grade durabilityraft-consensus/raft.py — The most complex protocol implementation; understanding its term-based safety, leader election, and partition simulation illuminates why consensus is hard---
modules-are-fully-isolated — No module imports from any other module; each top-level directory is a self-contained implementation with zero cross-module dependencies, including cases where integration would be natural (bloom filter + LSM tree)all-implementations-stdlib-only — Every implementation uses only Python standard library modules; no external runtime dependencies exist, enforcing first-principles implementation of all algorithmsdual-test-suite-convention — Each module maintains two parallel test files: test*.py (pytest, broader coverage) and testertest_*.py (standalone scripts using stdout PASSED/FAILED protocol), with the tester files averaging ~60% the line count of their pytest counterpartscrc-covers-payload-not-headers — The WAL, Bitcask, and B-tree storage engine all compute CRC32 over data payloads only, excluding header/framing metadata — a consistent design choice across the repo that leaves header corruption undetectableno-steal-buffer-management — All storage engine implementations use NO-STEAL policy: uncommitted transaction data never reaches disk, eliminating undo logging at the cost of requiring all active transaction state to fit in memory