Topic: How the three sync modes (sync, batch, none) trade latency for durability, and what "acknowledged" really means in each mode

Date: 2026-05-29

Time: 12:28

Sync Modes in the Write-Ahead Log: Trading Latency for Durability

The Core Mechanism: dosync

Everything hinges on one method at write-ahead-log/wal.py:125-133:


def _do_sync(self, force: bool = False):
    """Fsync based on sync mode."""
    if self._sync_mode == "sync" or force:
        self._fd.flush()
        os.fsync(self._fd.fileno())
    elif self._sync_mode == "batch":
        self._write_count += 1
        if self._write_count >= self._batch_sync_count:
            self._fd.flush()
            os.fsync(self._fd.fileno())
            self._write_count = 0

Notice what's *absent*: there's no elif self.syncmode == "none" branch. If the mode is "none", neither flush() nor fsync() is called — the data sits in Python's userspace buffer and the OS page cache, at the mercy of the kernel's writeback schedule and a clean process exit.

The Three Modes

"sync" — Every write is durable before return

Every call to append() (line 148-149) triggers dosync(), which calls flush() + os.fsync(). The fsync syscall blocks until the kernel confirms the data has reached stable storage (disk platters, flash cells — past the volatile write cache, assuming the drive honors fsync correctly).

What "acknowledged" means: When append() returns your sequence number, that record is on disk. A power failure one nanosecond later won't lose it. This is the strongest guarantee and the slowest mode — each write pays the full disk I/O latency (typically 0.1–10ms for SSD, much worse for spinning rust).

"batch" — Amortize fsync across N writes

The WAL counts writes and only fsyncs every batchsynccount operations (default: 100, set at line 66). Between fsyncs, data accumulates in the OS page cache.

What "acknowledged" means: When append() returns, the data has been written to the kernel buffer (via Python's file write) but may *not* be on disk yet. Up to batchsynccount - 1 acknowledged records could be lost on a crash. The 100th write triggers the fsync that makes all 100 durable at once. This is a classic throughput optimization — you trade a bounded window of potential data loss for dramatically higher write throughput.

"none" — No explicit durability

No flush(), no fsync(). Data is written to Python's internal buffer, and eventually to the OS page cache when the buffer fills or on close(). The OS may write it to disk whenever it feels like it.

What "acknowledged" means: When append() returns, the data is in your process's memory. It is *not* guaranteed to be in the kernel page cache, let alone on disk. A process crash (segfault, OOM kill) can lose data that never made it past Python's buffer. A kernel crash or power failure can lose data that made it to the page cache but not to disk. This mode is only appropriate for data you can afford to lose — e.g., a performance log, or a WAL in front of a system that has other durability mechanisms.

The Critical Exception: force=True

Both appendbatch() (line 163) and checkpoint() (line 170) call do_sync(force=True), which always fsyncs regardless of mode. This means:

This is a deliberate design choice: the sync mode governs the durability of *individual* append() calls, but *transactional* operations (append_batch) and *recovery markers* (checkpoint) always get the full fsync treatment. The "none" and "batch" modes weaken single-record durability but preserve transactional durability.

The Latency/Durability Spectrum

| Mode | append() durability | append_batch() durability | Latency per write | Risk window |

|------|----------------------|----------------------------|-------------------|-------------|

| "sync" | On disk | On disk | ~1 fsync | 0 records |

| "batch" | In page cache | On disk | ~1/100th fsync | Up to 99 records |

| "none" | In process buffer | On disk | ~0 (memcpy) | All unbatched records |

Topics to Explore

Beliefs