Topic: The WAL commit method calls page_manager.sync() then truncates the WAL; trace whether metadata is ever synced as part of this commit fence

Date: 2026-05-29

Time: 10:31

WAL commit and the Metadata Sync Question

The WAL's commit method lives at b-tree-storage-engine/btree.py:139:


def commit(self, page_manager):
    page_manager.sync()
    self._f.seek(0)
    self._f.truncate(0)
    self._f.flush()
    os.fsync(self._f.fileno())
    self._seq = 0

The commit fence has two phases:

1. Sync the data filepagemanager.sync() (line 103) calls flush() then os.fsync() on the single data file (self.f). This forces every dirty page — including any metadata page writes — to stable storage.

2. Truncate the WAL — only after the fsync returns does the WAL zero itself out and fsync its own file descriptor.

Does metadata get synced?

Yes, but only implicitly. The PageManager uses a single file handle (self.f) for everything — data pages *and* the metadata page (page 0). When sync() calls os.fsync(self.f.fileno()), the kernel flushes all pending writes to that file descriptor, which includes any metadata writes that happened earlier.

Here's the key: look at how metadata is written. PageManager.writemeta() (line 42) does a seek(0) + write() + flush(), but no fsync. The same is true for writemeta() (line 60), which delegates to writemeta(). So during normal B-tree operations — splits, allocations via allocatepage() (line 87), frees via free_page() (line 97) — metadata is written to the OS page cache but not forced to disk.

The metadata only hits disk when one of these happens:

The durability contract

The ordering is:


WAL.log_write()  →  fsync WAL entry
 ... (more writes) ...
WAL.commit()     →  fsync data file (including metadata page 0)
                 →  truncate + fsync WAL file

This is correct for crash recovery: if the process crashes between logwrite and commit, the WAL entries survive and recover() replays them. If it crashes after the data-file fsync but before the WAL truncate, recovery replays already-applied writes — which is idempotent since writepage overwrites the same page numbers.

The gap

There is no separate metadata sync — metadata is flushed alongside all other pages in the single os.fsync() call. This works because PageManager uses one file for everything. If the design ever split metadata into a separate file, the commit fence would need an additional fsync for that file, and the ordering between the two fsyncs would matter.

Topics to Explore

Beliefs