Date: 2026-05-29
Time: 11:21
Now I have a thorough picture. Here's the explanation.
ext4-mount-options-durability — How data=journal vs data=ordered vs data=writeback changes crash semantics and whether any mode compensates for missing directory fsync (none fully do)write-ahead-log/wal.py:dosync — The three sync modes (sync/batch/none) and how fdatasync would slot in as an optimization; the central dispatch for all durability decisionsmacos-f-fullfsync-semantics — How fcntl(fd, F_FULLFSYNC) differs from fsync on Darwin/APFS, and why the current codebase may have no durable writes on the development platformlog-structured-merge-tree/lsm.py:_flush — The most dangerous path: SSTable creation with no fsync on file or directory, followed by WAL truncation, creating a window where both can be lostatomic-write-via-temp-rename — The write-to-temp/fsync/rename/fsync-dir pattern used by SQLite, PostgreSQL, and RocksDB to guarantee both content and existence durability — absent from this entire codebaseno-directory-fsync-anywhere — None of the 13 os.fsync calls in the codebase target a directory file descriptor; all fsync calls operate on data file descriptors only, leaving every file creation, rename, and deletion non-durable at the directory levelfsync-semantics-vary-by-filesystem — os.fsync() provides different effective guarantees on ext4 (data+metadata durable), XFS (durable but delayed allocation can create longer crash windows), and macOS/APFS (may not flush disk write cache without F_FULLFSYNC)fdatasync-safe-for-all-append-paths — Every append-only file in the codebase (WAL segments, Bitcask data files) writes sequentially with monotonically increasing file size, making fdatasync a safe drop-in for fsync on those paths; in-place B-tree page overwrites benefit even more since file size doesn't changeposix-fsync-is-per-file-not-cross-file — POSIX fsync(fd) provides no ordering guarantees across different file descriptors; the WAL-before-data ordering relied upon by the B-tree (btree.py:137 then btree.py:105) is enforced only by the sequence of separate fsync calls, not by any single callmacos-fsync-not-durable — On the current platform (Darwin 24.4.0/APFS), os.fsync() may not flush the disk write cache; true durability requires fcntl(fd, F_FULLFSYNC), which is never used anywhere in the codebase