Date: 2026-05-28
Time: 18:48
readrecord — WAL Binary Record Deserializerreadrecord is the inverse of encoderecord: it reads a single binary WAL record from a file stream and returns it as a WALRecord dataclass. It's the fundamental deserialization primitive that every read path in the WAL depends on — recovery, replay, truncation, and iteration all call this in a loop.
It exists separately from the WriteAheadLog class because it's a pure function over a file handle with no dependency on WAL instance state. This makes it reusable for both normal operation and crash recovery (where the WAL object is still being constructed).
f must be a binary-mode file object ("rb") positioned at a record boundary. Calling this on a file positioned mid-record produces garbage.f is advanced past exactly one complete record. On None return, f is at EOF or positioned past a partial record (torn write). On exception, f position is undefined.encoderecord and fully persisted to disk will always round-trip through readrecord without CRC error.| Parameter | Type | Description |
|-----------|------|-------------|
| f | binary file object | An open file handle in read-binary mode, positioned at the start of a record. Not typed explicitly — the function duck-types on .read(). |
No validation is performed on f. Passing a text-mode file will raise TypeError from struct.unpack.
WALRecord — on successful parse of a complete, CRC-valid record.None — on EOF (no bytes left) or a torn write (partial length prefix or partial record body). This is the expected signal for "no more records."None (benign end-of-data) from ValueError (corruption). Most callers treat None as "stop reading this file" and ValueError as "stop reading all files" (see readall_records).The on-disk format per record is:
[4B length][4B crc][8B seq_num][1B op_type][4B key_len][key bytes][4B val_len][value bytes]
Step by step:
1. Read length prefix (4 bytes, little-endian uint32). This is the byte count of everything *after* the length field itself. A short read here means EOF → return None.
2. Read record body (record_length bytes). A short read means a torn write (crash mid-record) → return None. This is the crash-tolerance mechanism: incomplete records are silently discarded.
3. Unpack fixed header from the record body: CRC (4B), sequence number (8B), operation type (1B), key length (4B). Format string <IQBi = little-endian unsigned-int, unsigned-long-long, unsigned-byte, signed-int.
4. Extract key — slice key_len bytes starting at offset 17 (4+8+1+4).
5. Extract value length — unpack 4 bytes as signed int at the current offset.
6. Extract value — slice val_len bytes.
7. Verify CRC — compute CRC32 over optypebyte || key || value (same computation as encoderecord), mask to 32 bits. If it doesn't match the stored CRC, raise ValueError. This detects bit rot and partial overwrites that happened to write exactly record_length bytes.
8. Construct and return a WALRecord with the op type resolved from byte to string via OP_NAMES, and key/value decoded as UTF-8.
f by 4 + record_length bytes on success, or by fewer bytes on partial read.f.| Condition | Behavior |
|-----------|----------|
| EOF before length prefix | Returns None |
| Torn write (partial body) | Returns None |
| CRC mismatch | Raises ValueError with sequence number |
| Invalid UTF-8 in key/value | Raises UnicodeDecodeError (unhandled) |
| Unknown optypebyte | Silently maps to "UNKNOWN" via OP_NAMES.get() |
Notable: there's no protection against a keylen or vallen that exceeds record_length. A corrupted length field could cause a slice that reads past the buffer boundary — though in practice this just returns a short slice (Python doesn't raise on out-of-bounds slicing), and the CRC check will catch it.
Every read-side method calls readrecord in the same loop pattern:
while True:
try:
rec = _read_record(f)
if rec is None:
break
# process rec
except ValueError:
break # or return, stopping all reads
Callers include:
recoverseq_num — scans all files at startup to find the highest sequence number. Swallows ValueError per-file (skips corrupted tails).readall_records — generator that yields all valid records. On ValueError, it *returns* (stops reading all remaining files), treating corruption as a hard stop.truncate — reads all records from a file to filter by sequence number, then rewrites the file.struct — binary packing/unpacking with format strings.zlib.crc32 — CRC32 checksum for integrity verification.WALRecord — the output dataclass.OP_NAMES — maps op-type bytes to human-readable strings.No external dependencies. The function is self-contained and stateless.
1. f is in binary mode and positioned at a record boundary — no runtime check.
2. keylen and vallen are non-negative — signed int is used (<i), so negative values would produce empty slices, pass CRC (if the original data was encoded that way), and silently succeed.
3. Key and value are valid UTF-8 — .decode("utf-8") will raise if they aren't, but there's no try/except around it.
4. The CRC covers optype + key + value but not seqnum or keylen/vallen. A corrupted sequence number that passes CRC won't be detected. This matches encoderecord's CRC computation but is a design tradeoff — sequence number corruption is silent.
write-ahead-log/wal.py:encoderecord — The encoding counterpart; understanding the exact binary layout makes the offset arithmetic in readrecord obviouswrite-ahead-log/wal.py:WriteAheadLog.replay — Shows how readrecord results are filtered to implement committed-only replay semanticswrite-ahead-log/wal.py:WriteAheadLog.truncate — A read-modify-write cycle that deserializes with readrecord and re-serializes with encoderecord, demonstrating the round-trip contractwrite-ahead-log/test_wal.py — Test cases that exercise torn writes, CRC corruption, and recovery scenarioswal-crc-coverage-gap — The CRC excludes seq_num from its computation — worth understanding whether this is intentional or a latent bugwal-read-record-none-on-partial — readrecord returns None (not raises) on partial/torn writes, enabling crash tolerance by silently discarding incomplete trailing recordswal-crc-excludes-seq-num — The CRC32 integrity check covers optype + key + value but does not cover seqnum, keylen, or vallen, so corruption of those fields is undetected if the payload happens to matchwal-record-format-length-prefixed — Each WAL record is prefixed with a 4-byte little-endian length that covers everything after itself, allowing the reader to skip or validate entire records atomicallywal-read-record-stateless — readrecord is a module-level pure function with no dependency on WriteAheadLog instance state, enabling its use during construction (recovery) before the object is fully initialized