Function: commit in b-tree-storage-engine/btree.py

Date: 2026-05-28

Time: 18:52

WAL.commit — Finalize a WAL transaction by clearing the log

Purpose

commit marks the end of a write-ahead log transaction. After all dirty pages have been flushed to the data file, this method clears the WAL, signaling that the writes are durable and no recovery is needed on restart. It implements the "checkpoint" step of WAL-based crash safety: once the data file is known-good, the log entries that got it there are no longer needed.

Contract

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| page_manager | PageManager | The page I/O layer whose data file must be synced before the WAL is cleared. |

Return Value

None. This is a side-effect-only method.

Algorithm

1. pagemanager.sync() — Flush the data file's OS buffer and fsync it. This guarantees every page written via PageManager.writepage is durable on disk, not just in kernel buffers. This step *must* happen before the WAL is cleared — otherwise a crash could leave both the data file incomplete and the WAL empty, causing data loss.

2. self.f.seek(0) + self.f.truncate(0) — Erase the WAL file contents entirely. seek(0) positions the file cursor at the start; truncate(0) discards all bytes. After this, the WAL is logically empty.

3. self.f.flush() + os.fsync(self.f.fileno()) — Force the truncated (empty) WAL state to durable storage. Without this, the OS could still have the old WAL content cached, and a crash could "resurrect" stale log entries that would be replayed against already-committed data — corrupting it via double-application.

4. self._seq = 0 — Reset the monotonic sequence counter so the next transaction's log entries start from 1.

Side Effects

Error Handling

No exceptions are caught. If sync(), truncate(), or fsync() fails (e.g., disk full, I/O error), the exception propagates to the caller. This is the correct behavior — a failed commit should not silently succeed, since the WAL may still be needed for recovery.

Usage Patterns

commit is called at the end of every mutating B-tree operation (put, delete, close):


# In BTree.put, after all page writes are done:
self.wal.commit(self.pm)

The pattern is always: (1) log page writes via walwritepage / walwritemeta, (2) call commit to finalize. The caller must not interleave unrelated writes between log_write and commit, since commit clears *all* log entries.

Dependencies

Key Assumption

The ordering guarantee — data file fsync before WAL truncation fsync — is what makes crash recovery correct. The code assumes that os.fsync provides the barrier semantics described by POSIX: after fsync returns, the data is on stable storage. On some hardware (e.g., drives with volatile write caches that lie about flush completion), this assumption can be violated. The code does not use O_DIRECT or fdatasync; it trusts the OS and hardware to honor fsync.

Beliefs