Topic: The gap between walfiles() and os.path.getsize() is a TOCTOU window; explore whether this matters in practice given the single-writer design.

Date: 2026-05-29

Time: 12:14

The TOCTOU Window in openlatest

The race condition lives in openlatest at write-ahead-log/wal.py:101-109:


def _open_latest(self):
    files = self._wal_files()                          # line 102: snapshot directory listing
    if files:
        last = files[-1]
        if os.path.getsize(last) < self._max_file_size:  # line 105: stat the file
            self._current_file = last
            self._fd = open(last, "ab")                   # line 107: open for append
            return
    self._rotate()

Three filesystem operations happen sequentially with no atomicity guarantee:

1. List the WAL directory (walfiles() at line 102 calls os.listdir)

2. Stat the last file (os.path.getsize at line 105)

3. Open it for append (line 107)

Between steps 1 and 2, the file last could be deleted — by a concurrent truncate() call that removes fully-consumed WAL segments (line 188 onwards). Between steps 2 and 3, the file could grow past maxfile_size, causing the writer to append to an already-oversized segment instead of rotating.

Why it doesn't matter here

openlatest only runs during _init (line 79). At construction time, the WAL instance hasn't been handed to any caller yet, so no thread can be calling append, truncate, or rotate concurrently against this same instance.

The threading lock protects all mutation paths. Every method that modifies WAL state — append (line 151), appendbatch (line 161), checkpoint (line 177), truncate (line 185) — acquires self.lock (line 76). Since openlatest runs before the constructor returns, the lock can't be contended.

The single-writer design means one WriteAheadLog instance per directory. There's no second instance that could be truncating files out from under the first. The test at write-ahead-log/testwal.py:33-44 (testcrash_recovery) creates a second instance on the same directory, but it does so sequentially — the first instance isn't actively writing when the second opens.

Where it *would* matter

The design has no inter-process protection. If an external process (backup script, second application instance) deleted a WAL file between walfiles() and os.path.getsize(), the getsize call would raise FileNotFoundError and crash the constructor. A production WAL would use advisory file locks (fcntl.flock or equivalent) on the directory to enforce single-writer semantics across processes, not just across threads.

Similarly, mayberotate (line 147) uses self.fd.tell() instead of os.path.getsize(), which avoids the TOCTOU entirely for the hot path — the file position is in-process state, not a filesystem query. This is the right pattern; open_latest uses getsize because it inherits a file it didn't create, so it doesn't have a valid tell() position yet.

Bottom line

The TOCTOU window is real but inert. The single-writer-per-directory contract, combined with openlatest running only at construction time, means no concurrent mutation can hit the gap. It's a textbook example of a race condition that's benign by design invariant rather than by synchronization — which is fine for a reference implementation, but would need hardening (directory-level flock) before use in production.

Beliefs