Function: log_write in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 08:23

WAL.log_write — Append a Page Write to the Write-Ahead Log

Purpose

logwrite records a pending page write to the WAL file before the actual data file is modified. This is the core of the WAL protocol: if the process crashes after logwrite but before (or during) the actual page write, WAL.recover can replay the logged entry to bring the data file back to a consistent state. Without this method, a crash mid-write could leave a half-written page in the data file — corrupting the B-tree.

Contract

Preconditions:

Postconditions:

Invariant: Entries in the WAL file are strictly ordered by monotonically increasing sequence numbers, with no gaps within a single session.

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| page_num | int | The page number in the data file this write targets. Packed as a 4-byte big-endian unsigned int. |

| pagedata | bytes | The full page contents to be written. Length is encoded in the header, so variable-length data is supported at the WAL level (though callers always pass fixed pagesize buffers). |

Return Value

None. This is a side-effect-only method. The caller is responsible for subsequently calling WAL.commit to clear the log after the transaction completes.

Algorithm

1. Increment sequence counterself._seq += 1. This provides ordering and lets recover process entries in the order they were written.

2. Pack the header — 12 bytes total: sequence number (4B), page number (4B), data length (4B), all big-endian unsigned ints.

3. Compute and pack checksum — CRC32 of page_data only (not the header), stored as a 4-byte big-endian unsigned int. This detects torn writes during recovery.

4. Seek to end of fileseek(0, 2) positions the cursor at EOF so entries are appended, never overwriting earlier entries in the same transaction.

5. Write the entryheader + pagedata + checksum as a single write call. The entry format is: [seq:4][pagenum:4][datalen:4][pagedata:variable][crc32:4].

6. Flush + fsyncflush() pushes Python's userspace buffer to the OS, then os.fsync() forces the OS to flush to stable storage. This is what makes the durability guarantee real.

Side Effects

Error Handling

None. If struct.pack fails (wrong types), write fails (disk full), or fsync fails (I/O error), the exception propagates uncaught. This is appropriate — a WAL write failure means the transaction cannot be made durable and must not proceed. The caller (BTree.walwrite_page) does not catch these either, so a failed WAL write aborts the entire put/delete operation.

Usage Patterns

logwrite is never called directly by external code. It's called by BTree.walwritepage and BTree.walwrite_meta, which form a transaction bracket:


# Typical transaction pattern (from BTree.put):
self.wal.log_write(page_num, padded)   # log intent
self.pm.write_page(page_num, padded)   # apply to data file
# ... possibly more log_write + write_page pairs ...
self.wal.commit(self.pm)               # clear WAL, transaction complete

A single B-tree mutation (e.g., a split) may call log_write multiple times before a single commit. All logged entries are replayed atomically during recovery — there is no partial replay.

Dependencies

Assumptions Not Enforced by Types

1. pagedata is bytes, not strstruct.pack('>I', self.checksum(page_data)) would fail on str, but the caller is responsible for encoding.

2. Single-writer — there's no file locking. If two processes open the same WAL, _seq values will collide and appended entries could interleave, corrupting the log.

3. write is atomic for the payload size — the code does a single write(header + page_data + checksum). For typical 4KB pages, this is ~4KB + 16 bytes, which is generally atomic on local filesystems but not guaranteed by POSIX. A torn write here is detected by the checksum in recover, which skips corrupted entries.

4. The checksum covers only pagedata, not the header — a corrupted header (wrong datalen) could cause recover to read the wrong number of bytes, potentially misaligning all subsequent entries. The recovery loop handles this by bailing out when offset + data_len + 4 > len(data).

Topics to Explore

Beliefs