test_corruption's 5-byte overwrite strategyDate: 2026-05-28
Time: 18:47
The WAL uses a two-layer integrity scheme: a 4-byte length prefix for framing and a CRC32 checksum for content verification. Understanding both layers is key to seeing why test_corruption's 5-byte overwrite reliably detects corruption.
encoderecord (wal.py:29-35) builds each record as:
[record_length: u32][crc: u32][seq_num: u64][op_type: u8][key_len: i32][key][val_len: i32][value]
← 4 bytes → ← 4 → ← 8 → ← 1 → ← 4 → ← 4 →
The length prefix (record_length, first 4 bytes) tells the reader how many bytes to consume *after* the prefix itself. This is computed at line 33:
record_length = 4 + 8 + 1 + 4 + len(key) + 4 + len(value) # 21 + key + value
Note: the length prefix counts the payload including CRC but *not* itself — readrecord reads 4 bytes for the length, then reads record_length more bytes.
The CRC32 (via zlib.crc32, line 31) covers only the semantic content: optypebyte + key + value. It deliberately excludes seqnum, keylen, and val_len from the checksum. This means:
ValueError raised at line 53None (treated as EOF/truncation)seq_num → undetected (silent wrong ordering)In testcorruption (testwal.py:48-58):
f.seek(-5, 2) # 5 bytes before end of file
f.write(b"\xff\xff\xff\xff\xff") # overwrite with 0xFF
Two records are written: PUT "a" "1" and PUT "b" "2". For these short key/value pairs, each record is roughly 25–27 bytes total (4-byte length prefix + 21 bytes header + tiny key/value). The 5-byte overwrite from the end lands squarely in the second record's value or trailing bytes.
This triggers detection through one of two paths:
1. CRC mismatch: If the corruption hits the value bytes, zlib.crc32 of the corrupted content won't match the stored CRC → ValueError at line 53 → recoverseq_num breaks out of the read loop (line 90), and replay stops, yielding only the first record.
2. Length/framing corruption: If the corruption hits a length field, readrecord reads the wrong number of bytes → either a short read (returns None) or misaligned parsing that cascades into a CRC failure.
Either way, the result is the same: replay() returns only the first record (len(records) == 1), proving the WAL correctly stops at the corruption boundary rather than returning garbage.
The WAL follows a fail-fast, truncate-at-corruption strategy. In recoverseq_num (line 83), a ValueError from CRC mismatch causes the scanner to break — it does not skip corrupted records to find valid ones later in the file. This is the standard approach for sequential WALs: everything after corruption is untrusted.
write-ahead-log/wal.py:readrecord — The deserialization path where CRC validation and framing errors are detected; trace it byte-by-byte to understand failure modeswrite-ahead-log/wal.py:encoderecord — Compare what the CRC covers (op+key+value) vs. what it excludes (seq_num, lengths) — this is a deliberate design choice with tradeoffswal-recovery-semantics — The WAL stops at the first corrupted record rather than scanning past it; explore whether this is always safe (what if corruption hits the middle of a multi-file WAL?)write-ahead-log/test_wal.py — The truncation and rotation tests reveal how the WAL handles multi-file scenarios, which interact with the corruption-stops-reading behaviorcrc32-vs-cryptographic-hash — CRC32 detects accidental corruption but not adversarial tampering; consider when a stronger integrity check (SHA-256) would be warrantedwal-uses-crc32-not-sha — WAL integrity uses zlib.crc32 (32-bit, non-cryptographic); it detects accidental corruption but not intentional tamperingcrc32-covers-op-key-value-only — The CRC32 checksum covers optype + key + value but excludes seqnum, keylen, and vallen, meaning sequence number corruption is silentwal-stops-at-first-corruption — Recovery (recoverseq_num, replay) breaks on the first CRC error or short read; records after a corrupted record are discarded even if intactrecord-format-is-length-prefixed — Each record starts with a 4-byte little-endian length prefix that does not count itself, enabling the reader to frame records without delimiterscorruption-test-exploits-tail-position — test_corruption overwrites the last 5 bytes of the file, which targets the second record's payload, guaranteeing a CRC mismatch while leaving the first record intact