fdatasync skips unnecessary metadata updates (mtime) and can be 2-3x faster; explore whether switching would be safe hereDate: 2026-05-29
Time: 10:04
fdatasync vs fsync in Append-Only Storage Filesfsync flushes both file data and all metadata (mtime, atime, permissions, size) to stable storage. fdatasync flushes file data and only the metadata required for subsequent data retrieval — in practice, file size if it changed, but not mtime. For append-only workloads where every write grows the file, fdatasync does everything you need while skipping an extra inode write for timestamps. On Linux ext4, this can eliminate a second disk barrier per sync, yielding 2-3x throughput improvement under heavy write loads.
fsync is used todayThe codebase has three distinct sync profiles:
write-ahead-log/wal.pyThe standalone WAL uses os.fsync() on nearly every write path. The dosync method at line 126-133 is the central dispatch:
# wal.py:126-133
def _do_sync(self, force: bool = False):
if self._sync_mode == "sync" or force:
self._fd.flush()
os.fsync(self._fd.fileno())
elif self._sync_mode == "batch":
self._write_count += 1
if self._write_count >= self._batch_sync_count:
self._fd.flush()
os.fsync(self._fd.fileno())
self._write_count = 0
Additional os.fsync() calls appear in rotate (line 115), truncate (line 184), checkpoint via do_sync (line 183-184), and the truncation rewrite loop (line 208-209). This file is append-only — every write appends records, and rotation creates a new file. Strong candidate for fdatasync.
hash-index-storage/bitcask.pyThe writerecord method (line 85-93) does:
self.active_file.flush()
if self.sync_writes:
os.fsync(self.active_file.fileno())
Bitcask data files are strictly append-only. The file is opened with "ab" mode (line 68). Safe for fdatasync.
b-tree-storage-engine/btree.pyThe B-tree has two sync paths:
log_write at line 138): append-only — fdatasync safesync at line 112, close at line 117): does in-place page writes (write_page seeks to specific offsets). However, fdatasync still flushes data for in-place writes — it only skips non-essential metadata. Also safe for fdatasync, since the page file's size doesn't change during normal writes (pages are pre-allocated).log-structured-merge-tree/lsm.pyThe LSM WAL class (line 13-63) only calls self._fd.flush() at line 26 — no os.fsync() or os.fdatasync() at all. This is a durability gap: flush() only moves data from Python's userspace buffer to the OS page cache; a crash could still lose data. Before considering fdatasync, this file needs *any* sync call.
log-structured-hash-table/bitcask.py and event-sourcing-store/event_store.pylog-structured-hash-table/bitcask.py calls flush() at line 155 but no fsync. The event store (line 123) opens with "a" and writes JSON lines without any sync. These have the same durability gap as the LSM WAL.
Yes, for all append-only paths, with two caveats:
1. Platform availability: Python's os.fdatasync() exists on Linux but not on macOS/Darwin (the current platform). A portable implementation needs a fallback:
`python
_fdatasync = getattr(os, 'fdatasync', os.fsync)
`
2. File creation requires directory sync: When creating new WAL segments (_rotate in wal.py:112-122) or new SSTable files, fdatasync on the file alone doesn't guarantee the directory entry is durable. A crash between file creation and the next directory sync could leave an orphaned inode. The current code doesn't fsync directories either, so this is a pre-existing gap — but switching to fdatasync doesn't make it worse.
3. Truncation paths need fsync: The WAL truncation in wal.py:190-219 rewrites files entirely and may change file size downward. fdatasync handles size changes, so it's still technically safe, but truncation is rare enough that the performance benefit is negligible — keeping fsync for truncation is the conservative choice.
Replace os.fsync with os.fdatasync (with a fallback) in the append and sync paths of:
write-ahead-log/wal.py:dosync — highest impact, called on every write in sync modehash-index-storage/bitcask.py:writerecord — called on every put/deleteb-tree-storage-engine/btree.py:WAL.log_write — called on every B-tree mutationLeave os.fsync in place for:
PageManager.close() (final shutdown sync)Fix the missing sync in log-structured-merge-tree/lsm.py:WAL.append and log-structured-hash-table/bitcask.py:writerecord — these should call at least fdatasync after flush().