seqnum can cause silent mis-ordering during recovery, and whether adding seqnum to the CRC input is a worthwhile fixDate: 2026-05-29
Time: 08:20
Now I have a thorough understanding. Let me write the explanation.
write-ahead-log/wal.py:readallrecords — The corruption-stopping iterator that terminates the entire replay on CRC failure; understanding its break vs return semantics clarifies what happens when a corrupted seqnum fix causes a new CRC mismatch on old datawrite-ahead-log/wal.py:appendbatch — How batch atomicity interacts with sequence numbering; a corrupted seqnum on a COMMIT record could cause replay to skip or re-apply an entire batchwal-record-format-evolution — What a v2 record format would look like: CRC over the full header, a magic/version byte for forward compatibility, and migration strategy for existing WAL filesb-tree-storage-engine/btree.py:recover — The B-tree WAL has the same seq exclusion from CRC; compare whether its recovery logic is equally vulnerable to silent mis-orderingwrite-ahead-log/testwal.py — Tests exercise CRC corruption and torn writes but likely don't cover seqnum-specific corruption; a new test injecting a bit flip into the seq_num bytes (offset 8-15 of the record body) would validate the fixseq-num-corruption-causes-silent-data-loss — A bit flip in a WAL record's seqnum can cause truncate to incorrectly discard committed records (if deflated) or replay to skip committed operations (if deflated below the afterseq watermark), with no error raisedseq-num-crc-fix-is-zero-cost — Adding seqnum to the CRC input requires changing two lines (encoderecord:30 and read_record:53) with no measurable runtime overhead, since CRC32 over 9 extra bytes is negligiblerecord-length-corruption-is-implicitly-caught — Unlike seqnum, corruption of recordlength cascades into a CRC mismatch because it changes which bytes readrecord consumes, making explicit CRC coverage unnecessarywal-format-change-breaks-compatibility — Changing the CRC input in encoderecord invalidates all existing WAL files; a production deployment would require a version byte and dual-path CRC verification during migration