Function: readrecord in write-ahead-log/wal.py

Date: 2026-05-28

Time: 18:48

readrecord — WAL Binary Record Deserializer

Purpose

readrecord is the inverse of encoderecord: it reads a single binary WAL record from a file stream and returns it as a WALRecord dataclass. It's the fundamental deserialization primitive that every read path in the WAL depends on — recovery, replay, truncation, and iteration all call this in a loop.

It exists separately from the WriteAheadLog class because it's a pure function over a file handle with no dependency on WAL instance state. This makes it reusable for both normal operation and crash recovery (where the WAL object is still being constructed).

Contract

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| f | binary file object | An open file handle in read-binary mode, positioned at the start of a record. Not typed explicitly — the function duck-types on .read(). |

No validation is performed on f. Passing a text-mode file will raise TypeError from struct.unpack.

Return Value

Algorithm

The on-disk format per record is:


[4B length][4B crc][8B seq_num][1B op_type][4B key_len][key bytes][4B val_len][value bytes]

Step by step:

1. Read length prefix (4 bytes, little-endian uint32). This is the byte count of everything *after* the length field itself. A short read here means EOF → return None.

2. Read record body (record_length bytes). A short read means a torn write (crash mid-record) → return None. This is the crash-tolerance mechanism: incomplete records are silently discarded.

3. Unpack fixed header from the record body: CRC (4B), sequence number (8B), operation type (1B), key length (4B). Format string <IQBi = little-endian unsigned-int, unsigned-long-long, unsigned-byte, signed-int.

4. Extract key — slice key_len bytes starting at offset 17 (4+8+1+4).

5. Extract value length — unpack 4 bytes as signed int at the current offset.

6. Extract value — slice val_len bytes.

7. Verify CRC — compute CRC32 over optypebyte || key || value (same computation as encoderecord), mask to 32 bits. If it doesn't match the stored CRC, raise ValueError. This detects bit rot and partial overwrites that happened to write exactly record_length bytes.

8. Construct and return a WALRecord with the op type resolved from byte to string via OP_NAMES, and key/value decoded as UTF-8.

Side Effects

Error Handling

| Condition | Behavior |

|-----------|----------|

| EOF before length prefix | Returns None |

| Torn write (partial body) | Returns None |

| CRC mismatch | Raises ValueError with sequence number |

| Invalid UTF-8 in key/value | Raises UnicodeDecodeError (unhandled) |

| Unknown optypebyte | Silently maps to "UNKNOWN" via OP_NAMES.get() |

Notable: there's no protection against a keylen or vallen that exceeds record_length. A corrupted length field could cause a slice that reads past the buffer boundary — though in practice this just returns a short slice (Python doesn't raise on out-of-bounds slicing), and the CRC check will catch it.

Usage Patterns

Every read-side method calls readrecord in the same loop pattern:


while True:
    try:
        rec = _read_record(f)
        if rec is None:
            break
        # process rec
    except ValueError:
        break  # or return, stopping all reads

Callers include:

Dependencies

No external dependencies. The function is self-contained and stateless.

Assumptions Not Enforced by Types

1. f is in binary mode and positioned at a record boundary — no runtime check.

2. keylen and vallen are non-negative — signed int is used (<i), so negative values would produce empty slices, pass CRC (if the original data was encoded that way), and silently succeed.

3. Key and value are valid UTF-8 — .decode("utf-8") will raise if they aren't, but there's no try/except around it.

4. The CRC covers optype + key + value but not seqnum or keylen/vallen. A corrupted sequence number that passes CRC won't be detected. This matches encoderecord's CRC computation but is a design tradeoff — sequence number corruption is silent.

Topics to Explore

Beliefs