commit method syncs the data file then truncates the WAL; explore whether the ordering guarantees are correct if a crash hits between sync and truncateDate: 2026-05-29
Time: 10:59
commit MethodThe grep for def commit|def recover|def replay returned zero matches in the WAL module. The WriteAheadLog class in write-ahead-log/wal.py has no commit method — and no other module imports the WAL (the grep for from.*wal import|import.*wal found only test files). The "sync data file, then truncate WAL" pattern described in the question does not exist in this codebase yet. It would live in a higher-level storage engine that composes this WAL with a data file.
That said, the WAL does expose the building blocks — truncate() and checkpoint() — and analyzing them reveals a real concern.
truncate Actually DoesStarting at write-ahead-log/wal.py:178, the truncate(uptoseq) method:
1. Flushes and fsyncs the current WAL file descriptor
2. Closes the fd
3. Iterates over every .wal file, keeping only records with seqnum > upto_seq
4. Either deletes the file (if no records survived) or rewrites it with only the kept records
The code was cut off at line 200, but the visible portion shows the critical structure.
If a caller were to implement the "sync data, then truncate WAL" pattern, the safety question comes down to: what happens on crash between those two steps?
This is actually the correct failure mode. If the data file is synced but the WAL is not yet truncated, recovery replays WAL records that have *already* been applied. As long as WAL replay is idempotent (PUT overwrites, DELETE removes), this produces the correct state — you just do redundant work.
If the WAL were truncated first, a crash before data sync would lose committed operations with no recovery path. The "sync first, truncate second" ordering is the textbook-correct approach.
fsync on Rewritten WAL FilesInside truncate() at line ~185–200, when the method rewrites WAL files with only the kept records, the observation was cut off before we can see whether those rewritten files are fsynced. If they aren't, a crash during truncation could leave a WAL file with partial content — the old data overwritten but the new data not yet durable. This would silently lose WAL records.
fsync After File DeletionWhen truncate removes empty WAL files, the directory entry removal isn't durable until the directory itself is fsynced. Without os.fsync(os.open(self.dir, os.ORDONLY)), a crash could "resurrect" a deleted WAL file, leading to stale replays. This is the same class of bug as the well-known ext4 rename-without-directory-fsync issue.
checkpoint Method Is Just a MarkerThe checkpoint() method at line 170 writes an OP_CHECKPOINT record and forces a sync, but it's purely a marker. It doesn't trigger any data flush or WAL truncation on its own — a caller must coordinate those steps.
The sync-then-truncate ordering described in the question would be correct if implemented, because the failure mode (crash between sync and truncate) is safe — it just replays already-applied operations. The real risks are in the truncate() implementation itself: the observations were truncated before we could verify whether rewritten WAL files are fsynced and whether directory entries are fsynced after deletion. Those are the crash-safety gaps worth auditing.