Topic: Determine whether writecount should reset on any fsync (including forced) or only on batch-threshold syncs — potential bug

Date: 2026-05-29

Time: 08:23

writecount Reset Behavior — Potential Bug in Batch Sync Mode

The Bug

In write-ahead-log/wal.py:124-134, the dosync method has a subtle logic error when sync_mode == "batch" and force=True:


def _do_sync(self, force: bool = False):
    if self._sync_mode == "sync" or force:          # line 126
        self._fd.flush()
        os.fsync(self._fd.fileno())
    elif self._sync_mode == "batch":                 # line 129
        self._write_count += 1
        if self._write_count >= self._batch_sync_count:
            self._fd.flush()
            os.fsync(self._fd.fileno())
            self._write_count = 0                    # line 134

The if/elif structure means the two branches are mutually exclusive. When syncmode == "batch" and force=True, the first branch fires (because force is true), performs the fsync, but never touches writecount. The elif branch — the only place where write_count is incremented or reset — is skipped entirely.

Where Forced Syncs Happen

Two callers pass force=True:

Regular append calls (wal.py:149) use the default force=False, so in batch mode they go through the elif branch and correctly manage the counter.

Concrete Scenario

With batchsynccount=100:

1. 50 individual append calls → writecount reaches 50

2. appendbatch fires → forced fsync flushes everything to disk, but write_count stays at 50

3. 50 more append calls → writecount reaches 100, triggers a batch sync

That batch sync at step 3 fires after only 50 unsynced writes, not 100. The 50 writes from step 1 were already durable after the forced fsync in step 2, but they're still counted toward the threshold.

Impact Assessment

This is a performance bug, not a correctness bug. Data is never lost — the forced sync guarantees durability, and the premature batch sync just adds an unnecessary extra fsync. In workloads that mix append with append_batch or checkpoint, batch mode will sync more frequently than the configured threshold intends.

The severity depends on workload shape. If forced syncs are rare relative to batchsynccount, the effect is negligible. If they're frequent (e.g., many small batches), the counter almost never reaches the threshold organically — it keeps getting "frozen" by forced syncs and then slowly climbing from a stale base.

The Fix

Reset writecount in the forced-sync path when in batch mode:


def _do_sync(self, force: bool = False):
    if self._sync_mode == "sync" or force:
        self._fd.flush()
        os.fsync(self._fd.fileno())
        if force and self._sync_mode == "batch":
            self._write_count = 0
    elif self._sync_mode == "batch":
        self._write_count += 1
        if self._write_count >= self._batch_sync_count:
            self._fd.flush()
            os.fsync(self._fd.fileno())
            self._write_count = 0

This ensures the counter reflects reality: after any fsync, there are zero unsynced writes pending.

Contrast with Other Implementations

For comparison, hash-index-storage/bitcask.py:88 takes a simpler approach — it syncs on every write when sync_writes=True, with no batching mode at all. The LSM tree's WAL (log-structured-merge-tree/lsm.py:27) also does a flush() per write with no fsync batching. The batch-sync logic is unique to the WAL implementation, which is why this counter-management bug only surfaces here.

Topics to Explore

Beliefs