Date: 2026-05-28
Time: 19:21
WriteAheadLog.replayreplay reconstructs the state of committed operations after a crash or restart. It reads the WAL files on disk and returns all PUT and DELETE records whose sequence numbers are newer than a given checkpoint. This is the core recovery mechanism — a consumer calls replay after a crash to re-apply operations that were logged but may not have been flushed to the primary data store.
Preconditions:
after_seq must be a non-negative integer. The method doesn't validate this — a negative value would effectively mean "return everything," which happens to work but isn't intentional.self._dir must not be externally modified between the flush and the read. There's no file-level lock protecting against concurrent external writers.Postconditions:
op_type of either "PUT" or "DELETE".seqnum > afterseq.readrecord).Invariant: The returned list is a subset of what iterate() would return, filtered to data-bearing operations above the sequence threshold.
| Parameter | Type | Default | Meaning |
|-----------|------|---------|---------|
| after_seq | int | 0 | Sequence number high-water mark. Only records strictly greater than this are returned. Pass 0 to replay the entire log; pass the sequence number from your last checkpoint to get only what's new. |
List[WALRecord] — an in-memory list of all qualifying records. The caller gets a snapshot, not a lazy iterator, so this can be large if the WAL is large. Each WALRecord contains seqnum, optype, key, value, and checksum.
1. Flush buffered writes. Acquires self._lock and flushes the file descriptor. This ensures any append calls that returned before replay was called are visible on disk. The lock is released immediately — the read phase is lock-free.
2. Scan all WAL files. Delegates to readallrecords(), which iterates WAL files in sorted filename order, reading records sequentially from each. If any record fails CRC validation, readallrecords stops entirely (returns, not continues) — corruption truncates the replay at that point.
3. Filter by sequence number. Records with seqnum <= afterseq are skipped.
4. Filter by operation type. Only PUT and DELETE records pass through. COMMIT and CHECKPOINT records are control markers and are discarded.
flush() call). Then opens and reads every .wal file in the log directory.self.seqnum, the WAL files, or any other instance state beyond the flush.readrecord raises ValueError on checksum failure. readall_records catches this and stops iteration — any records after corruption are silently lost. This is a deliberate design choice: corruption means the write was incomplete (likely a crash mid-write), so everything from that point forward is suspect.readrecord returns None, which ends iteration for that file. readall_records then moves to the next file.OSError.Typical crash-recovery flow:
wal = WriteAheadLog("/var/data/wal")
last_checkpoint_seq = load_checkpoint_from_store()
records = wal.replay(after_seq=last_checkpoint_seq)
for rec in records:
if rec.op_type == "PUT":
store.put(rec.key, rec.value)
elif rec.op_type == "DELETE":
store.delete(rec.key)
store.flush()
wal.checkpoint()
The caller is responsible for applying the records idempotently — replay may return the same records across multiple calls if no new checkpoint or truncation occurs between them.
readall_records — the internal iterator that does the actual file I/O and corruption handling.readrecord (module-level) — binary deserialization and CRC verification of individual records.walfiles — discovers and sorts WAL segment files by filename.threading.Lock — protects the flush but not the read phase.The docstring says "skips uncommitted batches," but the implementation doesn't actually track batch boundaries. There's no batch-start marker in the format, so replay cannot distinguish a batch's PUT/DELETE records from individual writes. It returns all PUT/DELETE records regardless of whether a matching COMMIT exists. The inline comment acknowledges this explicitly. In practice, this means a crash mid-batch will replay the partial batch — the atomicity guarantee of append_batch only holds at the fsync level (all-or-nothing write to disk), not at the replay-filtering level.