Topic: The syncmode parameter and batchsync_count determine how often fsync is called, which directly controls the window of vulnerability to torn writes

Date: 2026-05-29

Time: 08:18

How syncmode and batchsync_count Control the Torn Write Window

The Core Mechanism: dosync

The entire durability story lives in one method — dosync at write-ahead-log/wal.py:125:


def _do_sync(self, force: bool = False):
    """Fsync based on sync mode."""
    if self._sync_mode == "sync" or force:
        self._fd.flush()
        os.fsync(self._fd.fileno())
    elif self._sync_mode == "batch":
        self._write_count += 1
        if self._write_count >= self._batch_sync_count:
            self._fd.flush()
            os.fsync(self._fd.fileno())
            self._write_count = 0

There are three modes, and each defines a different durability contract:

Mode 1: "sync" (default)

Every single append() call triggers flush() + os.fsync() (lines 127–128). The write hits stable storage before append returns. The vulnerability window is essentially zero — if the process crashes after append returns, the record is on disk. This is the safest mode and the default (wal.py:64: sync_mode: str = "sync").

Mode 2: "batch"

Writes accumulate in an OS buffer. The WAL tracks a counter (writecount, initialized at line 69) and only calls fsync when the counter reaches batchsynccount (line 131: if self.writecount >= self.batchsynccount). The default batch size is 100 (line 66). This means up to 99 records can sit in OS buffers without being fsync'd — if the machine loses power during that window, those records are lost or partially written (torn).

The tradeoff is throughput: batching amortizes the cost of fsync (which can take milliseconds on spinning disks) across many writes. But the vulnerability window grows linearly with batchsynccount.

Mode 3: "none" (implicit)

If syncmode is neither "sync" nor "batch", do_sync does nothing — no flush(), no fsync(). The data sits in userspace and OS buffers indefinitely. The vulnerability window extends until the OS decides to write dirty pages back, or until something else forces a sync. This is the fastest and the most dangerous.

Where force=True Bypasses the Mode

Two operations always fsync regardless of mode:

This is a deliberate design: individual puts can tolerate batched durability, but transactional boundaries (COMMIT, CHECKPOINT) cannot.

How CRC Detects Torn Writes After the Fact

The syncmode controls *prevention* — how much data is at risk. CRC32 checksums handle *detection*. Each record's checksum covers the op type, key, and value (line 30: crcdata = struct.pack("B", optypebyte) + key + value). On read, readrecord recomputes the CRC and raises ValueError on mismatch (lines 52–54). The test at testwal.py:44 (testcorruption) verifies this: it overwrites the last 5 bytes of a WAL file and confirms that replay recovers only the intact first record.

A partial/torn write produces either a short read (caught at lines 40–41 and 44–45, returning None) or a CRC mismatch. Either way, replay stops at the corruption boundary — it doesn't silently return bad data.

The Durability Spectrum

| Mode | fsync frequency | Records at risk | Use case |

|------|----------------|-----------------|----------|

| "sync" | Every write | 0 | Financial transactions, small write volume |

| "batch" (n=100) | Every 100 writes | Up to 99 | High-throughput logging, acceptable loss window |

| "none" | Never (by WAL) | All unbuffered | Testing, ephemeral data |

The batchsynccount parameter lets you tune the batch window precisely — set it to 10 for a tighter window with moderate throughput, or 1000 for maximum throughput with a wider risk window.

Topics to Explore

Beliefs