Original source: write-ahead-log-wal-_do_sync

Function: dosync in write-ahead-log/wal.py

Date: 2026-05-29

Time: 08:30

`dosync` — Write-Ahead Log Fsync Policy

Purpose

dosync controls when buffered WAL writes are durably persisted to disk. It exists because there's a fundamental tension in WAL design: calling fsync after every write guarantees durability but kills throughput; skipping it risks data loss on crash. This method encapsulates the durability policy so callers (append, append_batch, checkpoint) don't need to know which sync strategy is active.

Contract

Preconditions:

self.fd is an open file descriptor in append-binary mode ("ab"). No null check is performed — calling this with fd = None will raise AttributeError.
self.syncmode is one of "sync", "batch", or implicitly any other string (which results in no sync at all — a silent "none" mode).

Postconditions:

In "sync" mode or when force=True: all buffered data is flushed to the OS and fsync'd to stable storage. The data survives a process crash, OS crash, or power loss (assuming the storage hardware honors fsync).
In "batch" mode without force: the write counter increments. If it hits the threshold, data is fsynced and the counter resets. Otherwise, data remains in the userspace/kernel buffer — not durable.
In any other mode (implicit "none"): nothing happens. Data sits in Python's write buffer.

Invariant: self.writecount is always in [0, self.batchsync_count - 1] after a non-forced call in batch mode.

Parameters

|-----------|------|---------|---------|

| force | bool | False | Bypasses the sync mode policy and forces an immediate fsync. Used by append_batch and checkpoint to guarantee atomicity/durability of critical records regardless of the configured mode. |

Edge case: If force=True and syncmode == "batch", the force path (syncmode == "sync" or force) takes precedence, but writecount is not reset. This means the batch counter keeps accumulating across forced syncs, which is harmless but slightly imprecise — the next batch-triggered sync may come earlier than expected.

Return Value

None. This is a side-effect-only method.

Algorithm


1. If sync_mode is "sync" OR force is True:
   → flush Python's internal buffer to the OS
   → fsync the file descriptor (block until hardware confirms write)
   → DONE

2. Else if sync_mode is "batch":
   → increment _write_count
   → if _write_count >= _batch_sync_count:
       → flush + fsync (same as above)
       → reset _write_count to 0
   → DONE

3. Otherwise (implicit "none" mode):
   → do nothing — writes remain buffered

The two-step flush() then fsync() is necessary because Python's io layer maintains its own buffer separate from the OS page cache. flush() pushes data from Python → kernel; fsync() pushes data from kernel → disk.

Side Effects

I/O: Calls self._fd.flush() and os.fsync(), which may block for milliseconds to seconds depending on the storage device and write queue depth.
State mutation: In batch mode, mutates self.writecount. This is the only mode with internal state changes.
No locking: The method itself doesn't acquire self.lock. It relies on its callers (append, appendbatch, checkpoint) to hold the lock. This is a correctness requirement that isn't enforced by the method signature.

Error Handling

No exceptions are caught. Both flush() and os.fsync() can raise OSError (disk full, I/O error, bad file descriptor), which will propagate directly to the caller. This is the correct behavior — a failed sync in a WAL is a critical error that the application must handle, not swallow.

Usage Patterns

Three call sites, each with a different durability need:


# Single record — respects configured policy
def append(self, ...):
    self._fd.write(data)
    self._do_sync()           # might not actually sync in batch mode

# Atomic batch — must be durable
def append_batch(self, ...):
    self._fd.write(bytes(buf))
    self._do_sync(force=True) # always syncs, regardless of mode

# Checkpoint marker — must be durable
def checkpoint(self):
    self._fd.write(...)
    self._do_sync(force=True) # always syncs

The pattern is: individual writes tolerate deferred durability (you might lose the last few writes on crash), but batch commits and checkpoints must be durable immediately because downstream consumers rely on their presence to determine recovery boundaries.

Dependencies

os.fsync — POSIX fsync(2) wrapper. On Linux this is a true fsync; on macOS it's fcntl(F_FULLFSYNC) only if explicitly called (Python's os.fsync maps to fsync(2), which on macOS does not guarantee flush to platter unless the drive firmware cooperates). This is a subtle portability assumption.
self._fd — a Python file object opened in "ab" mode via the built-in open().

Assumptions Not Enforced by Types

1. syncmode is a free-form string, not an enum. Passing "SYNC" or "Sync" silently falls into the no-op branch. There's no validation at construction time either.

2. fd is never None when this method is called. The truncate method sets fd = None temporarily, and if dosync were called during that window, it would crash.

3. Callers hold self._lock. Thread safety depends entirely on caller discipline.

4. force=True in batch mode doesn't reset writecount, which means the counter drifts. Not a bug, but a subtlety.