Date: 2026-05-29
Time: 08:12
WAL.recover — Write-Ahead Log Crash Recoveryrecover replays uncommitted WAL entries back into the data file after a crash. When the process dies between writing pages and calling commit(), the WAL file retains a record of intended writes. On the next startup, recover re-applies those writes to bring the data file to a consistent state. This is the core crash-safety mechanism — without it, a crash mid-operation leaves the B-tree with partially written pages.
pagemanager must accept writepage(page_num, data) and sync() calls.| Parameter | Type | Description |
|-----------|------|-------------|
| pagemanager | PageManager | The page I/O layer that owns the data file. Receives replayed writes via writepage and is fsynced via sync(). |
Returns an int — the count of successfully recovered (replayed) entries. Returns 0 if the WAL was empty or contained no valid entries. The caller (BTree._init_) doesn't act on this value, but it's useful for diagnostics.
1. Read the entire WAL into memory (self.f.seek(0); data = self.f.read()). Early-return 0 if empty.
2. Parse entries sequentially. Each entry has the layout:
`
[seq: 4B] [pagenum: 4B] [datalen: 4B] [pagedata: datalen B] [checksum: 4B]
`
3. Bounds check before each read:
offset + ENTRYHEADERSIZE <= len(data))offset + data_len + 4 > len(data)). If not, break — this is a torn write at the tail.4. Checksum verification: Compute CRC32 of page_data and compare against the stored checksum. Only replay if they match. A mismatch means the entry was incompletely written during a crash — skip it silently.
5. Replay: Call pagemanager.writepage(pagenum, pagedata) to overwrite the page in the data file.
6. Finalize:
page_manager.sync() — fsync the data file so all replayed writes hit disk.commit(), marking recovery complete.page_manager.sync().There is no explicit error handling. If the WAL file is corrupt in a way that still parses (e.g., valid header pointing to wrong data), the checksum gate catches it. However:
write_page or sync will propagate as an unhandled exception, leaving the WAL intact (truncation hasn't happened yet) — this is safe because the next startup will re-attempt recovery.sync() is critical: if the process crashes after replaying but before truncating, the next startup replays again harmlessly (idempotent writes).Called exactly once during BTree._init_:
self.wal = WAL(wal_path)
self.wal.recover(self.pm)
This is unconditional — recovery runs on every startup. If the WAL is empty (normal shutdown called commit()), it returns 0 immediately. The caller doesn't need to check whether recovery is needed; the method handles the empty case.
struct — for unpacking binary WAL entry headers and checksumszlib.crc32 — via _checksum(), used for integrity verificationos.fsync — ensures WAL truncation is durablePageManager.write_page / PageManager.sync — the target for replayed writes1. Page data in the WAL matches pagemanager.pagesize. recover writes whatever datalen says without checking it against the page manager's expected size. PageManager.writepage pads or truncates, but silently truncating recovered data would corrupt the page.
2. WAL entries were written by the same code version with the same ENTRY_HEADER format. There's no version or magic number in the WAL file.
3. The sequence number (seq) is parsed but never used — entries are replayed in file order regardless of sequence. If entries were somehow reordered on disk, the last write to a given page wins.
4. No deduplication: if the WAL contains two entries for the same page (e.g., metadata page 0 written multiple times in a transaction), both are replayed. The last one wins, which is correct only because WAL entries are appended in causal order.
wal-recover-checksum-gate — WAL recovery only replays entries whose CRC32 checksum matches, silently skipping torn writes from crasheswal-recover-then-truncate — Recovery fsyncs the data file before truncating the WAL, ensuring replayed writes survive a crash during recovery itselfwal-recover-idempotent — Replaying the WAL multiple times produces the same result because write_page overwrites unconditionallywal-seq-unused-in-recovery — The sequence number field is parsed during recovery but never consulted; replay order is determined solely by file offsetwal-recover-on-every-startup — BTree._init_ calls recover() unconditionally; an empty WAL short-circuits with no I/O beyond the initial read