Date: 2026-05-28
Time: 19:06
I have enough from the observations to give a thorough explanation. Let me trace through the code.
log-structured-merge-tree/lsm.py — Shows how the LSM tree uses a simpler WAL class (lsm.py:13) as a component, including replay on startup (lsm.py:230) and truncation after memtable flush (lsm.py:314)write-ahead-log/wal.py:append_batch — Trace how batched writes achieve atomicity by buffering all records plus a COMMIT marker and writing them in a single write() callwrite-ahead-log/wal.py:truncate — Understand how old records are pruned: the method rewrites files keeping only records with seqnum > upto_seq, removing files that become emptyfsync-durability-guarantees — Explore what os.fsync actually guarantees on different filesystems (ext4, XFS, APFS) and why flush() alone is insufficientrecord-boundary-recovery — Compare this "stop at first error" approach with alternatives like length-prefixed scanning with magic bytes or block-aligned records (as in LevelDB's log format)wal-record-length-prefixed — Every WAL record is prefixed with a 4-byte little-endian length that covers all subsequent fields (CRC through value), enabling the reader to know exactly how many bytes to consumewal-crc-covers-payload-only — The CRC32 checksum covers optype + key + value but excludes seqnum, recordlength, keylen, and val_len, meaning sequence number corruption would go undetectedwal-corruption-stops-file-scan — Both partial reads (None return) and CRC mismatches (ValueError) cause the reader to stop processing the current WAL file, discarding any subsequent records in that file even if they are validwal-batch-atomicity-via-commit-record — A batch is only considered complete if its trailing COMMIT record is present and passes CRC; incomplete batches (missing commit) are excluded by replay which filters on checkpoint/commit boundaries