Topic: The relationship between sync_writes, flush(), and fsync() — and what guarantees each level actually provides on different filesystems

Date: 2026-05-29

Time: 11:59

sync_writes, flush(), and fsync(): Three Levels of Durability

The Core Distinction

There are three levels of "making sure data is written," and this codebase uses all three — sometimes correctly, sometimes not:

| Level | Call | What it does | Survives process crash | Survives power loss |

|-------|------|-------------|----------------------|-------------------|

| 1 | write() only | Puts data in Python's userspace buffer | No | No |

| 2 | flush() | Pushes Python buffer → OS kernel page cache | Yes | No |

| 3 | flush() + os.fsync() | Pushes kernel page cache → physical disk | Yes | Yes |

The critical insight: flush() alone does NOT guarantee durability. It only moves data from your process into the kernel. If the machine loses power, the kernel's page cache is gone. Only fsync() forces the OS to write through to stable storage.

How Each Module Uses These Levels

Bitcask — The Cleanest Pattern

hash-index-storage/bitcask.py:82-88 shows the canonical two-tier approach:


self.active_file.flush()          # always: survive process crash
if self.sync_writes:
    os.fsync(self.active_file.fileno())  # optional: survive power loss

flush() is unconditional (line 86). fsync() is gated behind the syncwrites flag (line 87-88, default True). This gives callers a performance knob: tests pass syncwrites=False (every single test does this) because fsync is slow and tests don't need power-loss durability. Production code leaves it at True.

WAL — The Most Sophisticated

write-ahead-log/wal.py:125-133 implements three sync modes:


def _do_sync(self, force: bool = False):
    if self._sync_mode == "sync" or force:
        self._fd.flush()
        os.fsync(self._fd.fileno())
    elif self._sync_mode == "batch":
        self._write_count += 1
        if self._write_count >= self._batch_sync_count:
            self._fd.flush()
            os.fsync(self._fd.fileno())
            self._write_count = 0

Notice that _rotate (line 114-115) always does flush+fsync before closing the old file, regardless of mode. This is correct — you must ensure the old file is fully durable before writing to a new one, or a crash mid-rotation could lose data from both.

B-tree PageManager — flush Without fsync

b-tree-storage-engine/btree.py:46,54,80 — the PageManager calls flush() after every page write but does not call fsync(). The explicit sync() method (lines 104-105) is a separate call:


def sync(self):
    self._f.flush()
    os.fsync(self._f.fileno())

This is intentional: individual page writes during a B-tree operation go to the kernel (visible to a re-read) but aren't forced to disk until sync() is called after the WAL commits or at close() (lines 112-113, 143-144). The WAL itself (inner class, lines 136-137) does flush+fsync on every log_write — because the WAL is the durability mechanism, and the data file can be reconstructed from it.

LSM WAL — The Dangerous One

log-structured-merge-tree/lsm.py:26 — this WAL only calls flush():


def append(self, key: str, value: bytes):
    ...
    self._fd.flush()

No fsync() anywhere in the LSM WAL. This means the WAL data reaches the kernel page cache but is not guaranteed to survive a power failure. For a write-ahead log — whose entire purpose is crash recovery — this is a durability gap. A process crash is fine (the kernel still has the data), but a kernel panic or power loss could silently lose WAL entries that the application thought were committed.

Event Store — No Durability Guarantees At All

event-sourcing-store/eventstore.py:130-137persist_event opens, writes, and closes without any explicit flush or fsync. It relies on Python's context manager (with open(...)) to close the file, which calls close() → implicit flush(). No fsync(). Even the implicit flush only happens at the with block boundary. For an event store, this means events could be lost on either process crash or power loss.

Filesystem Caveats

Even fsync() doesn't mean the same thing everywhere:

None of these implementations address directory fsync or platform-specific flush semantics, which is acceptable for reference implementations but would be a concern in production.

Topics to Explore

Beliefs