Date: 2026-05-29
Time: 08:23
WAL.log_write — Append a Page Write to the Write-Ahead Loglogwrite records a pending page write to the WAL file before the actual data file is modified. This is the core of the WAL protocol: if the process crashes after logwrite but before (or during) the actual page write, WAL.recover can replay the logged entry to bring the data file back to a consistent state. Without this method, a crash mid-write could leave a half-written page in the data file — corrupting the B-tree.
Preconditions:
self._f is open and writable.pagedata is a bytes object representing a complete page (callers always pad to pagesize before calling — see BTree.walwrite_page at line 179).page_num is a valid page index (0 for metadata, ≥1 for data pages).Postconditions:
fsync).self._seq has been incremented by 1.Invariant: Entries in the WAL file are strictly ordered by monotonically increasing sequence numbers, with no gaps within a single session.
| Parameter | Type | Description |
|-----------|------|-------------|
| page_num | int | The page number in the data file this write targets. Packed as a 4-byte big-endian unsigned int. |
| pagedata | bytes | The full page contents to be written. Length is encoded in the header, so variable-length data is supported at the WAL level (though callers always pass fixed pagesize buffers). |
None. This is a side-effect-only method. The caller is responsible for subsequently calling WAL.commit to clear the log after the transaction completes.
1. Increment sequence counter — self._seq += 1. This provides ordering and lets recover process entries in the order they were written.
2. Pack the header — 12 bytes total: sequence number (4B), page number (4B), data length (4B), all big-endian unsigned ints.
3. Compute and pack checksum — CRC32 of page_data only (not the header), stored as a 4-byte big-endian unsigned int. This detects torn writes during recovery.
4. Seek to end of file — seek(0, 2) positions the cursor at EOF so entries are appended, never overwriting earlier entries in the same transaction.
5. Write the entry — header + pagedata + checksum as a single write call. The entry format is: [seq:4][pagenum:4][datalen:4][pagedata:variable][crc32:4].
6. Flush + fsync — flush() pushes Python's userspace buffer to the OS, then os.fsync() forces the OS to flush to stable storage. This is what makes the durability guarantee real.
fsync blocks until the hardware confirms persistence.self._seq is incremented. This counter is not persisted independently; it resets to 0 when the WAL is committed or when a new WAL instance is created (it's reconstructed implicitly during recovery by reading entries).None. If struct.pack fails (wrong types), write fails (disk full), or fsync fails (I/O error), the exception propagates uncaught. This is appropriate — a WAL write failure means the transaction cannot be made durable and must not proceed. The caller (BTree.walwrite_page) does not catch these either, so a failed WAL write aborts the entire put/delete operation.
logwrite is never called directly by external code. It's called by BTree.walwritepage and BTree.walwrite_meta, which form a transaction bracket:
# Typical transaction pattern (from BTree.put):
self.wal.log_write(page_num, padded) # log intent
self.pm.write_page(page_num, padded) # apply to data file
# ... possibly more log_write + write_page pairs ...
self.wal.commit(self.pm) # clear WAL, transaction complete
A single B-tree mutation (e.g., a split) may call log_write multiple times before a single commit. All logged entries are replayed atomically during recovery — there is no partial replay.
struct — binary packing for the fixed-format header and checksum.zlib — via _checksum (CRC32). Used for integrity checking, not security.os.fsync — the POSIX durability primitive. This is what separates a WAL from a regular buffered log.1. pagedata is bytes, not str — struct.pack('>I', self.checksum(page_data)) would fail on str, but the caller is responsible for encoding.
2. Single-writer — there's no file locking. If two processes open the same WAL, _seq values will collide and appended entries could interleave, corrupting the log.
3. write is atomic for the payload size — the code does a single write(header + page_data + checksum). For typical 4KB pages, this is ~4KB + 16 bytes, which is generally atomic on local filesystems but not guaranteed by POSIX. A torn write here is detected by the checksum in recover, which skips corrupted entries.
4. The checksum covers only pagedata, not the header — a corrupted header (wrong datalen) could cause recover to read the wrong number of bytes, potentially misaligning all subsequent entries. The recovery loop handles this by bailing out when offset + data_len + 4 > len(data).
b-tree-storage-engine/btree.py:WAL.recover — The other half of the durability contract: replays logged entries and validates checksums after a crashb-tree-storage-engine/btree.py:WAL.commit — Clears the WAL after a successful transaction; understanding when and why it truncates is key to the atomicity guaranteeb-tree-storage-engine/btree.py:BTree.walwrite_page — The caller that pads data and coordinates WAL logging with actual page writesfsync-durability-guarantees — How fsync interacts with disk caches, battery-backed write caches, and different filesystems (ext4, ZFS, etc.) — the WAL's correctness depends on fsync actually reaching stable storageb-tree-storage-engine/test_btree.py — Tests that exercise crash recovery scenarios to verify the WAL protocol holds under simulated failureswal-append-only-until-commit — WAL entries are only appended, never overwritten; the file is truncated to zero only on commit or after successful recoverwal-checksum-covers-data-only — The CRC32 checksum covers page_data but not the entry header, so header corruption is detected indirectly via length mismatch rather than checksum failurewal-fsync-per-entry — Each log_write call forces an fsync, guaranteeing durability of individual entries at the cost of one sync per logged pagewal-seq-monotonic-per-session — The sequence counter increments from 0 on each WAL instantiation and resets on commit; it is not globally unique across sessionswal-single-writer-assumed — The WAL has no file locking; concurrent writers would corrupt the log with interleaved entries and duplicate sequence numbers