Date: 2026-05-29
Time: 06:29
The WAL in write-ahead-log/wal.py:29-34 uses this record layout:
[record_length:4][crc:4][seq_num:8][op_type:1][key_len:4] [key] [val_len:4] [value]
Three problems stand out:
1. The CRC doesn't cover the header. At line 30, only the op type, key, and value are checksummed:
crc_data = struct.pack("B", op_type_byte) + key + value
The seqnum, recordlength, and keylen fields are not protected. A bit flip in seqnum would silently produce a valid-looking record with the wrong sequence number. A corrupted keylen would cause the reader to slice the key/value boundary in the wrong place — the CRC would then fail, but the error message at line 56 would blame the wrong thing (CRC mismatch at seq {seqnum}, where seq_num itself might be garbage).
2. No format version marker. There's no way for a reader to distinguish v1 records from a hypothetical v2. If you change the layout, old readers will silently misparse new files. Contrast this with the SSTable implementation in sstable-and-compaction/sstable.py:10-16, which already does it right:
MAGIC = b"SSTB"
VERSION = 1
HEADER_FMT = ">4sHI" # magic(4) + version(2) + entry_count(4)
The SSTable reader validates the magic at line 119 (assert magic == MAGIC) before touching any data. The WAL has no equivalent guard.
3. No forward compatibility. Every field is positionally encoded. Adding a new field (say, a transaction ID or a compression flag) requires changing the struct format string, which breaks all existing readers. There's no mechanism to skip unknown fields.
Here's the concrete design, building on patterns already present in the codebase:
[magic:1][version:1][record_length:4][header_crc:4] [header fields...] [key] [value] [payload_crc:4]
Magic byte (0xAB or similar): Lets the reader distinguish WAL records from garbage, partial writes, or other file formats. The SSTable uses a 4-byte magic (b"SSTB") but a single byte suffices per-record since we don't need human readability — we need a fast rejection test during recovery scans.
Version byte: Tells the reader which header layout follows. 0x01 = v1 compatibility shim, 0x02 = the new format. A reader seeing an unknown version can skip record_length bytes and continue recovery — degraded but not dead.
Header CRC: A CRC32 over [magic][version][recordlength] and all header fields. This is the key fix. Currently (wal.py:30-31), the CRC covers only the payload, so header corruption goes undetected. In v2, a corrupted seqnum or key_len would be caught before the reader even attempts to parse the key/value data.
Payload CRC: A separate CRC32 over key + value bytes. Splitting the CRCs lets you detect *which part* is corrupt — header damage vs. payload damage require different recovery strategies.
Instead of a fixed struct like <IIQBi (wal.py:33), each header field becomes a TLV (type-length-value) triple:
[field_tag:1][field_len:2][field_data:variable]
Known tags for v2:
0x01 = seq_num (8 bytes)0x02 = op_type (1 byte)0x03 = key_len (4 bytes)0x04 = val_len (4 bytes)A v3 reader encountering an unknown tag (say 0x05 = transactionid) skips fieldlen bytes and keeps going. This is exactly how protocol buffers achieve wire compatibility, and it's why the Bitcask format in hash-index-storage/bitcask.py:10 — which uses a fixed HEADER_FORMAT = "<dII" — would also benefit from this pattern.
The tradeoff: TLV adds ~3 bytes of overhead per field (5 fields = 15 bytes). For a WAL that fsyncs every record (wal.py:120-125), those extra bytes are negligible compared to the disk flush cost. The LSM WAL in log-structured-merge-tree/lsm.py:20-25 uses bare length-prefixed writes with no tags at all — even simpler but even less evolvable.
The current recovery loop in readrecord (wal.py:37-58) does this:
1. Read 4 bytes for record_length
2. Read record_length bytes
3. Parse header fields
4. Verify CRC over payload only
v2 recovery would:
1. Scan for magic byte (resync after corruption)
2. Read version + record_length
3. Verify header CRC — if it fails, skip record_length bytes and scan for next magic
4. Parse TLV fields, skipping unknowns
5. Read key + value
6. Verify payload CRC
The magic-byte scan is the biggest operational improvement. Right now, a single corrupted recordlength at line 40-43 causes read_record to read the wrong number of bytes, and every subsequent record is misframed. With a magic byte, the reader can re-synchronize by scanning forward until it finds the next 0xAB.
RECORD_MAGIC = 0xAB
RECORD_VERSION = 2
def _encode_record_v2(seq_num, op_type_byte, key, value):
# Build TLV header fields
fields = bytearray()
fields += struct.pack("<BH", 0x01, 8) + struct.pack("<Q", seq_num)
fields += struct.pack("<BH", 0x02, 1) + struct.pack("B", op_type_byte)
fields += struct.pack("<BH", 0x03, 4) + struct.pack("<I", len(key))
fields += struct.pack("<BH", 0x04, 4) + struct.pack("<I", len(value))
payload = key + value
record_length = len(fields) + len(payload) + 4 # +4 for payload CRC
# Header: magic + version + length
preamble = struct.pack("BB", RECORD_MAGIC, RECORD_VERSION)
preamble += struct.pack("<I", record_length)
header_crc = zlib.crc32(preamble + fields) & 0xFFFFFFFF
payload_crc = zlib.crc32(payload) & 0xFFFFFFFF
return (preamble + struct.pack("<I", header_crc) +
bytes(fields) + payload +
struct.pack("<I", payload_crc))
The SSTable format (sstable-and-compaction/sstable.py:10-20) is the closest existing example of good practice: it has magic bytes, a version field, and structured footers. The WAL and Bitcask formats are the ones most in need of this upgrade — they're the components that must survive crashes and be readable across software versions.
write-ahead-log/wal.py:readrecord — Trace how a single corrupted record_length cascades into total read failure, motivating the magic-byte resync designsstable-and-compaction/sstable.py — The SSTable already implements magic + version + footer index; compare its evolution-readiness to the WAL'shash-index-storage/bitcask.py:writerecord — Bitcask has the same fixed-header fragility as the WAL; a v2 format would apply identically herecrc-coverage-audit — Audit all struct.pack/CRC sites across the codebase (87 matches found) to identify which components protect only payload vs. full-record integrityb-tree-storage-engine/btree.py — The B-tree WAL (line 120-135) uses its own entry format with checksums; compare its integrity guarantees to the standalone WALwal-crc-excludes-header — The WAL CRC at wal.py:30-31 covers only optype + key + value, leaving seqnum, recordlength, and keylen unprotected against bit flipssstable-has-magic-and-version — The SSTable format (sstable.py:10-11) uses a 4-byte magic b"SSTB" and a 2-byte version field, making it the only binary format in the repo with format versioningwal-no-resync-after-corruption — A corrupted recordlength in the WAL causes read_record to misframe all subsequent records; there is no scan-forward recovery mechanismbitcask-no-integrity-check — The Bitcask store (bitcask.py:10) uses a fixed <dII header with no CRC at all — records have no corruption detection