Topic: What integrity scheme the WAL uses (CRC32, SHA, length prefix) — critical for understanding test_corruption's 5-byte overwrite strategy

Date: 2026-05-28

Time: 18:47

WAL Integrity Scheme: CRC32 + Length Prefix

The WAL uses a two-layer integrity scheme: a 4-byte length prefix for framing and a CRC32 checksum for content verification. Understanding both layers is key to seeing why test_corruption's 5-byte overwrite reliably detects corruption.

Record Wire Format

encoderecord (wal.py:29-35) builds each record as:


[record_length: u32][crc: u32][seq_num: u64][op_type: u8][key_len: i32][key][val_len: i32][value]
 ← 4 bytes →         ← 4 →    ← 8 →        ← 1 →       ← 4 →              ← 4 →

The length prefix (record_length, first 4 bytes) tells the reader how many bytes to consume *after* the prefix itself. This is computed at line 33:


record_length = 4 + 8 + 1 + 4 + len(key) + 4 + len(value)  # 21 + key + value

Note: the length prefix counts the payload including CRC but *not* itself — readrecord reads 4 bytes for the length, then reads record_length more bytes.

CRC32 Coverage

The CRC32 (via zlib.crc32, line 31) covers only the semantic content: optypebyte + key + value. It deliberately excludes seqnum, keylen, and val_len from the checksum. This means:

Why the 5-Byte Overwrite Works

In testcorruption (testwal.py:48-58):


f.seek(-5, 2)                              # 5 bytes before end of file
f.write(b"\xff\xff\xff\xff\xff")            # overwrite with 0xFF

Two records are written: PUT "a" "1" and PUT "b" "2". For these short key/value pairs, each record is roughly 25–27 bytes total (4-byte length prefix + 21 bytes header + tiny key/value). The 5-byte overwrite from the end lands squarely in the second record's value or trailing bytes.

This triggers detection through one of two paths:

1. CRC mismatch: If the corruption hits the value bytes, zlib.crc32 of the corrupted content won't match the stored CRC → ValueError at line 53 → recoverseq_num breaks out of the read loop (line 90), and replay stops, yielding only the first record.

2. Length/framing corruption: If the corruption hits a length field, readrecord reads the wrong number of bytes → either a short read (returns None) or misaligned parsing that cascades into a CRC failure.

Either way, the result is the same: replay() returns only the first record (len(records) == 1), proving the WAL correctly stops at the corruption boundary rather than returning garbage.

Recovery Philosophy

The WAL follows a fail-fast, truncate-at-corruption strategy. In recoverseq_num (line 83), a ValueError from CRC mismatch causes the scanner to break — it does not skip corrupted records to find valid ones later in the file. This is the standard approach for sequential WALs: everything after corruption is untrusted.

Topics to Explore

Beliefs