Function: WAL in log-structured-merge-tree/lsm.py

Date: 2026-05-29

Time: 08:09

WAL — Write-Ahead Log for Crash Recovery

1. Purpose

WAL is a write-ahead log that ensures durability of writes to the LSM tree's in-memory memtable. Every mutation (put or delete) is appended to the WAL *before* it's applied to the memtable. If the process crashes, the memtable (which lives only in RAM) is lost — but the WAL file on disk contains every operation since the last flush. On restart, replay() reconstructs the memtable from the WAL, so no acknowledged writes are lost.

This is the classic WAL pattern from DDIA Chapter 3: the log turns an in-memory structure into a durable one without paying the cost of writing a full sorted structure on every operation.

2. Contract

3. Parameters

_init_(self, path: str)

append(self, key: str, value: bytes)

replay(self) -> List[Tuple[str, bytes]]

No parameters. Returns all recoverable entries from the file at self._path.

4. Return Value

5. Algorithm

Binary Record Format

Each record is 4 fields, no separators:


[key_len: 4 bytes big-endian uint32]
[key: key_len bytes, UTF-8]
[value_len: 4 bytes big-endian uint32]
[value: value_len bytes]

append

1. Encode the key to UTF-8 bytes.

2. Write the key length as a 4-byte big-endian unsigned integer (>I).

3. Write the key bytes.

4. Write the value length as a 4-byte big-endian unsigned integer.

5. Write the value bytes.

6. Flush the userspace buffer to the OS. This does not call fsync(), so durability depends on OS write-back behavior.

replay

1. If the file doesn't exist, return empty.

2. Read the entire file into memory (data).

3. Walk through data with a manual cursor (pos), parsing each record:

4. Return all successfully parsed entries.

The boundary checks at every step make this crash-tolerant: if the process died mid-write, the incomplete trailing record is simply ignored.

truncate

1. Close the current file descriptor.

2. Reopen the file in write-binary mode ("wb"), which truncates it to zero bytes.

3. Close that descriptor.

4. Reopen in append-binary mode ("ab") for future writes.

This three-step dance is necessary because "wb" truncates on open, but subsequent appends need "ab" mode.

6. Side Effects

7. Error Handling

There is essentially no error handling. Specifically:

8. Usage Patterns

Within LSMTree:


# On startup — recover memtable from WAL
self._wal = WAL(os.path.join(data_dir, "wal.log"))
self._replay_wal()  # calls self._wal.replay()

# On every put/delete — log before memtable mutation
self._wal.append(key, v)
self._memtable[key] = v

# On flush — memtable becomes SSTable, WAL is reset
self._wal.truncate()

# On shutdown
self._wal.close()

The caller (LSMTree) is responsible for:

1. Calling append before mutating the memtable (write-ahead invariant).

2. Calling truncate after the SSTable is fully written (otherwise data is lost).

3. Calling close on shutdown.

9. Dependencies