Function: replay in write-ahead-log/wal.py

Date: 2026-05-28

Time: 19:21

WriteAheadLog.replay

Purpose

replay reconstructs the state of committed operations after a crash or restart. It reads the WAL files on disk and returns all PUT and DELETE records whose sequence numbers are newer than a given checkpoint. This is the core recovery mechanism — a consumer calls replay after a crash to re-apply operations that were logged but may not have been flushed to the primary data store.

Contract

Preconditions:

Postconditions:

Invariant: The returned list is a subset of what iterate() would return, filtered to data-bearing operations above the sequence threshold.

Parameters

| Parameter | Type | Default | Meaning |

|-----------|------|---------|---------|

| after_seq | int | 0 | Sequence number high-water mark. Only records strictly greater than this are returned. Pass 0 to replay the entire log; pass the sequence number from your last checkpoint to get only what's new. |

Return Value

List[WALRecord] — an in-memory list of all qualifying records. The caller gets a snapshot, not a lazy iterator, so this can be large if the WAL is large. Each WALRecord contains seqnum, optype, key, value, and checksum.

Algorithm

1. Flush buffered writes. Acquires self._lock and flushes the file descriptor. This ensures any append calls that returned before replay was called are visible on disk. The lock is released immediately — the read phase is lock-free.

2. Scan all WAL files. Delegates to readallrecords(), which iterates WAL files in sorted filename order, reading records sequentially from each. If any record fails CRC validation, readallrecords stops entirely (returns, not continues) — corruption truncates the replay at that point.

3. Filter by sequence number. Records with seqnum <= afterseq are skipped.

4. Filter by operation type. Only PUT and DELETE records pass through. COMMIT and CHECKPOINT records are control markers and are discarded.

Side Effects

Error Handling

Usage Patterns

Typical crash-recovery flow:


wal = WriteAheadLog("/var/data/wal")
last_checkpoint_seq = load_checkpoint_from_store()
records = wal.replay(after_seq=last_checkpoint_seq)
for rec in records:
    if rec.op_type == "PUT":
        store.put(rec.key, rec.value)
    elif rec.op_type == "DELETE":
        store.delete(rec.key)
store.flush()
wal.checkpoint()

The caller is responsible for applying the records idempotently — replay may return the same records across multiple calls if no new checkpoint or truncation occurs between them.

Dependencies

Notable Design Decision

The docstring says "skips uncommitted batches," but the implementation doesn't actually track batch boundaries. There's no batch-start marker in the format, so replay cannot distinguish a batch's PUT/DELETE records from individual writes. It returns all PUT/DELETE records regardless of whether a matching COMMIT exists. The inline comment acknowledges this explicitly. In practice, this means a crash mid-batch will replay the partial batch — the atomicity guarantee of append_batch only holds at the fsync level (all-or-nothing write to disk), not at the replay-filtering level.