walfiles() and os.path.getsize() is a TOCTOU window; explore whether this matters in practice given the single-writer design.Date: 2026-05-29
Time: 12:14
openlatestThe race condition lives in openlatest at write-ahead-log/wal.py:101-109:
def _open_latest(self):
files = self._wal_files() # line 102: snapshot directory listing
if files:
last = files[-1]
if os.path.getsize(last) < self._max_file_size: # line 105: stat the file
self._current_file = last
self._fd = open(last, "ab") # line 107: open for append
return
self._rotate()
Three filesystem operations happen sequentially with no atomicity guarantee:
1. List the WAL directory (walfiles() at line 102 calls os.listdir)
2. Stat the last file (os.path.getsize at line 105)
3. Open it for append (line 107)
Between steps 1 and 2, the file last could be deleted — by a concurrent truncate() call that removes fully-consumed WAL segments (line 188 onwards). Between steps 2 and 3, the file could grow past maxfile_size, causing the writer to append to an already-oversized segment instead of rotating.
openlatest only runs during _init (line 79). At construction time, the WAL instance hasn't been handed to any caller yet, so no thread can be calling append, truncate, or rotate concurrently against this same instance.
The threading lock protects all mutation paths. Every method that modifies WAL state — append (line 151), appendbatch (line 161), checkpoint (line 177), truncate (line 185) — acquires self.lock (line 76). Since openlatest runs before the constructor returns, the lock can't be contended.
The single-writer design means one WriteAheadLog instance per directory. There's no second instance that could be truncating files out from under the first. The test at write-ahead-log/testwal.py:33-44 (testcrash_recovery) creates a second instance on the same directory, but it does so sequentially — the first instance isn't actively writing when the second opens.
The design has no inter-process protection. If an external process (backup script, second application instance) deleted a WAL file between walfiles() and os.path.getsize(), the getsize call would raise FileNotFoundError and crash the constructor. A production WAL would use advisory file locks (fcntl.flock or equivalent) on the directory to enforce single-writer semantics across processes, not just across threads.
Similarly, mayberotate (line 147) uses self.fd.tell() instead of os.path.getsize(), which avoids the TOCTOU entirely for the hot path — the file position is in-process state, not a filesystem query. This is the right pattern; open_latest uses getsize because it inherits a file it didn't create, so it doesn't have a valid tell() position yet.
The TOCTOU window is real but inert. The single-writer-per-directory contract, combined with openlatest running only at construction time, means no concurrent mutation can hit the gap. It's a textbook example of a race condition that's benign by design invariant rather than by synchronization — which is fine for a reference implementation, but would need hardening (directory-level flock) before use in production.
wal-open-latest-init-only — openlatest is called exclusively from _init, so the TOCTOU window between wal_files() and os.path.getsize() cannot be hit by concurrent WAL operationswal-single-writer-thread-level — The WAL enforces single-writer via threading.Lock but has no inter-process locking mechanism (no flock/PID file), so the single-writer invariant holds only within a single OS processwal-hot-path-avoids-getsize — mayberotate uses self._fd.tell() instead of os.path.getsize(), avoiding filesystem stat calls and TOCTOU races on every writewal-all-mutations-under-lock — Every WAL method that writes records or modifies files (append, appendbatch, checkpoint, truncate) acquires self.lock before performing any I/O