Original source: write-ahead-log-wal-_recover_seq_num

Function: recoverseq_num in write-ahead-log/wal.py

Date: 2026-05-29

Time: 06:40

`recoverseq_num` — WAL Sequence Number Recovery

Purpose

This method scans every existing WAL file on disk to find the highest sequence number that was ever written. It exists to solve a specific crash-recovery problem: after a restart, the WAL needs to resume issuing sequence numbers that are strictly higher than any previously used. Without this, a restarted WAL could reuse sequence numbers, breaking the monotonicity guarantee that downstream consumers (replay, truncation) depend on.

It's called exactly once, during _init_, before any new writes occur.

Contract

Precondition: self.dir exists and self.walfiles() is functional (the directory was already created by os.makedirs earlier in init_).
Postcondition: Returns an integer ≥ 0 representing the highest sequence number found across all WAL files. If no WAL files exist or all are empty, returns 0.
Invariant it establishes: After _init sets self.seqnum = self.recoverseqnum(), every subsequent append/appendbatch/checkpoint call increments seq_num before writing, so new records always get sequence numbers strictly greater than anything on disk.

Parameters

None (beyond self). All input comes from the filesystem via self.walfiles().

Return Value

An int — the maximum sequence number found, or 0 if no valid records exist. The caller (_init) assigns this directly to self.seqnum. Because append does self.seqnum += 1 *before* writing, the first new record will be maxseq + 1.

Algorithm

1. Initialize max_seq = 0.

2. Get the sorted list of .wal files from the log directory.

3. For each file, open it in binary read mode.

4. Read records one at a time via readrecord(f):

If readrecord returns None → EOF or partial/truncated record at the tail. Move to the next file.
If readrecord raises ValueError → CRC mismatch (corruption). Stop scanning this file entirely and move to the next.
Otherwise, update maxseq if this record's seqnum is higher.

5. After all files are scanned, return max_seq.

Side Effects

Read-only I/O against the WAL directory. No mutations to disk or object state — this is a pure scan that returns a value.

Error Handling

There are two failure modes in readrecord, and this method handles them differently:

| Condition | readrecord behavior | recoverseq_num response |

|---|---|---|

| EOF / truncated record | Returns None | Breaks inner loop, continues to next file |

| CRC mismatch | Raises ValueError | Breaks inner loop, continues to next file |

This is notably more forgiving than readall_records, which returns entirely (stops all files) on a ValueError. The recovery scan treats corruption as file-local: it abandons the corrupted file but still reads subsequent files. This makes sense — during recovery you want to find the highest sequence number across the entire WAL, even if one file has a corrupted tail.

Assumption: any OSError from open() or f.read() (permissions, missing file) is *not* caught and will propagate up to the caller, aborting initialization.

Usage Patterns

Called once in WriteAheadLog._init_:


self._seq_num = 0
# ...
self._seq_num = self._recover_seq_num()

The initial self.seqnum = 0 is immediately overwritten, so it's effectively dead code — the recovery result is what matters. No other code calls this method.

Dependencies

self.walfiles() — provides the sorted file list. Sorting is important: while recoverseqnum takes the global max so ordering doesn't affect correctness, the sort order matters for openlatest and rotate which are called immediately after.
readrecord(f) — the module-level binary deserialization function. Handles the wire format parsing and CRC validation.