SQLITEFCNTLSYNC mechanism as a reference implementation of correct behaviorDate: 2026-05-29
Time: 11:23
I have enough material. Here's the explanation:
sqlite-vfs-layer — SQLite's Virtual File System abstraction (osunix.c) that encapsulates platform-specific sync behavior including FFULLFSYNC, fdatasync, and directory fsync behind a single APIlog-structured-merge-tree/lsm.py:_flush — The most dangerous path in the codebase: SSTable creation with no fsync on file or directory, followed by WAL truncation, creating a window where both can be lostext4-auto-da-alloc — How ext4's autodaalloc feature (since kernel 2.6.30) partially mitigates the rename-without-directory-fsync problem for replace-via-rename, but does *not* help for new file creation — understanding when the filesystem saves you and when it doesn'twrite-ahead-log/wal.py:_rotate — WAL rotation creates new files at line 119 whose directory entries are never fsynced; compare with SQLite's journal file creation protocolmacos-f-fullfsync-semantics — How fcntl(fd, F_FULLFSYNC) differs from fsync on Darwin/APFS, and why SQLite's VFS uses it while this codebase's 13 fsync calls may provide no power-loss durability on the current platformsqlite-fsyncs-directories-after-file-ops — SQLite's Unix VFS calls fsync on the parent directory file descriptor after every file creation, rename, and journal/WAL lifecycle event, ensuring directory metadata reaches stable storage before dependent operations proceedno-directory-fsync-anywhere — All 13 os.fsync() calls across the DDIA implementations target regular file descriptors; no code anywhere opens a directory file descriptor or calls fsync on one, leaving every file creation and rename non-durable at the directory levelsqlite-fcntl-sync-abstracts-platform-differences — SQLITEFCNTLSYNC is part of SQLite's VFS abstraction that separates sync policy from mechanism, allowing the same durability logic to use fsync, fdatasync, or F_FULLFSYNC depending on the platform — a layer entirely absent from this codebaselsm-flush-then-truncate-is-compound-failure — The LSM tree's flush path creates an SSTable (no directory fsync) then truncates the WAL, creating a window where a crash loses both the WAL records and the SSTable's directory entry, resulting in silent data loss with no recovery signalatomic-rename-is-not-durable-rename — POSIX guarantees that os.rename() is atomic (all-or-nothing at crash) but not durable; without fsyncing the parent directory after rename, a crash can silently revert the rename on ext4/XFS, which both Bitcask compaction paths rely on without the directory fsync