Topic: Whether any consumer of replay validates sequence number continuity, which would surface silent data loss from mid-file corruption

Date: 2026-05-29

Time: 08:10

No Consumer Validates Sequence Number Continuity

Short answer: No. Neither replay itself nor any of its callers check that sequence numbers are contiguous. Mid-file corruption that destroys records cleanly (without leaving a partial/CRC-invalid record) would cause silent data loss.

How replay handles corruption today

The WAL's replay method (write-ahead-log/wal.py:212) delegates to readall_records (wal.py:233), which reads records in a loop and stops at the first corruption — a CRC mismatch raises ValueError, caught by the caller, which breaks out of the loop. This means:

What's missing

replay returns a flat List[WALRecord] (wal.py:212). It filters by after_seq (wal.py:226) and skips uncommitted batches, but at no point does it verify that the returned sequence numbers form a contiguous sequence (e.g., [3, 4, 5, 6] rather than [3, 6]).

The two callers

1. LSM tree (log-structured-merge-tree/lsm.py:231): calls self.wal.replay() and iterates over key-value pairs. It does not inspect seqnum at all — the LSM's own replay method (lsm.py:28) returns List[Tuple[str, bytes]], stripping sequence numbers entirely.

2. Tests (write-ahead-log/test_wal.py): the corruption test at line 44 verifies that replay returns fewer records after corruption (len(records) == 5), but it only checks the count — it never asserts that the returned sequence numbers are consecutive.

The gap

If records 3 and 4 (out of 1–6) are corrupted such that readrecord returns None or raises, replay stops at record 2. Records 5 and 6 are silently lost. No consumer would notice because:

A continuity check in replay or readallrecords — something like flagging when the next record's seqnum jumps by more than 1 — would surface this class of silent data loss.