Topic: How CRC32 in readrecord (line 57-59) interacts with OS write atomicity to detect torn writes

Date: 2026-05-29

Time: 06:33

CRC32 and Torn Write Detection in readrecord

The Two-Layer Defense

readrecord (write-ahead-log/wal.py:37) uses two distinct mechanisms to handle incomplete writes, and they catch different failure modes:

Layer 1 — Length-prefix framing (lines 39–44): Every record is preceded by a 4-byte recordlength. On read, if fewer than 4 bytes are available (line 41) or the body is shorter than recordlength (line 44), the function returns None silently. This catches the common torn-write case: a crash interrupts write() mid-record, so the file simply ends with a truncated record. The reader treats this as EOF — no error, just "this record didn't land."

Layer 2 — CRC32 verification (lines 53–56): If the length prefix *and* the full body are present but the data is wrong, the CRC catches it. The check recomputes zlib.crc32 over optypebyte + key + value and compares against the stored CRC at the front of the record. A mismatch raises ValueError — this is treated as corruption, not a benign truncation.

Why Both Layers Are Needed

POSIX write() is not atomic for arbitrarily sized buffers. The OS and disk can commit data in sector-sized chunks (typically 512B or 4KB). A crash during a multi-sector write produces a record where:

In this scenario, layer 1 sees a complete record (the length and body sizes match), so it doesn't trigger. But layer 2 catches it — the CRC was computed from the intended payload in encoderecord (line 31), and the stale tail bytes produce a different checksum.

What the CRC Covers (and Doesn't)

Looking at encoderecord (line 30–31):


crc_data = struct.pack("B", op_type_byte) + key + value
crc = zlib.crc32(crc_data) & 0xFFFFFFFF

The CRC covers optype, key, and value — the semantic payload. It does not cover seqnum, keylen, or vallen. This is a pragmatic choice: if keylen or vallen are corrupted by a torn write, the reader slices the wrong bytes for key/value, which overwhelmingly produces a CRC mismatch anyway. However, a torn write that corrupts only seq_num (bytes 5–12 of the record body) would go undetected — the record would be accepted with a wrong sequence number.

The Recovery Contract

The design establishes a clear contract: readrecord returns None for truncation (benign, expected at tail after crash) and raises ValueError for corruption (torn write that produced a plausible-length but wrong-content record). Recovery code upstream can use this distinction — skip tail truncation, but flag or halt on CRC errors.

Notable Gap

The grep for torn|corrupt|truncat|partial in the WAL tests returned zero matches. There are no tests exercising the torn-write detection path, so the CRC-mismatch behavior is untested at the WAL level (though b-tree-storage-engine/test_btree.py:257 does test CRC corruption for the B-tree).

Beliefs