Date: 2026-05-29
Time: 08:37
The picture is more nuanced than "no WAL and no fsync" — some implementations have both, some have one, and some have neither. The gaps create distinct failure modes.
log-structured-merge-tree/lsm.py:26 — the WAL's append method calls self._fd.flush() but never os.fsync(). This is the most dangerous pattern in the codebase because it *looks* safe but isn't.
flush() moves data from Python's userspace buffer into the OS kernel's page cache. It does not force the kernel to write to the physical disk. On power loss or kernel panic, everything in the page cache is gone. A user calls put(), the WAL writes the entry (line 242), flush() succeeds, and then the machine loses power — the WAL entry never made it to disk. On recovery, replaywal() (line 230) replays what's on disk, and that write is silently missing.
Scenario: You insert 50 keys, the OS buffers them all, power fails. On restart the WAL replays 0 entries. The memtable is empty. Those 50 keys are gone with no indication anything was lost.
log-structured-hash-table/bitcask.py:155-157 — writerecord() calls self.activefile.flush() with no os.fsync(). This implementation has no WAL at all, which for an append-only log is fine by design (the data file *is* the log). But without fsync, the append-only property doesn't help:
def _write_record(self, key: str, value: bytes) -> int:
...
self._active_file.write(header + payload)
self._active_file.flush()
return offset
After put() returns (line 168), the in-memory index points to an offset that may not exist on disk yet. On crash and recovery, scansegment() reads up to the last complete record on disk. The index will be rebuilt without the lost records, but any caller who received a successful put() was told the write succeeded.
Scenario: Write key "order-123", get back a successful offset. Power fails. Restart calls _recover() (line 72), which scans segments and rebuilds the index. "order-123" is not on disk. get("order-123") returns None.
sstable-and-compaction/sstable.py:91-95 — SSTableWriter.finish() calls self._f.close() without any fsync() beforehand. Python's close() calls flush() implicitly but again, that's only to the page cache. If a crash occurs after finish() returns but before the OS flushes its buffers, you get a partially written SSTable on disk.
Scenario: LSM flush writes a new SSTable, truncates the WAL (line 314 of lsm.py), then power fails before the OS writes the SSTable to disk. On recovery, the WAL is empty (it was truncated) and the SSTable is corrupt or incomplete. Data that was in the memtable is lost permanently — it was removed from the WAL but never durably written to the SSTable.
The LSM _flush() method (line 303 of lsm.py) does at least three things: writes an SSTable, updates the SSTable list, and truncates the WAL. Without fsync between steps, a crash at any point leaves the system in an inconsistent state. The most dangerous ordering: WAL truncated first, SSTable not yet on disk.
b-tree-storage-engine/btree.py:107-108): The WAL.log_write() calls both flush() and os.fsync(). The PageManager.sync() (line 100) and close() (line 104) also fsync. This is the correct pattern.hash-index-storage/bitcask.py:92-93): Conditionally calls os.fsync() when sync_writes=True. Defaults to True.write-ahead-log/wal.py:117-123): Full fsync implementation with configurable sync modes ("sync", "batch", "none").| Implementation | WAL? | fsync? | Data loss window |
|---|---|---|---|
| B-Tree (btree.py) | Yes, with fsync | Yes | Minimal |
| Hash index (hash-index-storage/bitcask.py) | N/A (log is data) | Configurable | When sync_writes=False |
| LSM Tree (lsm.py) | Yes, no fsync | No | Any buffered write |
| Log-structured hash (log-structured-hash-table/bitcask.py) | None | No | Any buffered write |
| SSTable writer (sstable.py) | N/A | No | Flush-to-SSTable window |
The core problem isn't the absence of a WAL — it's that flush() without fsync() gives a false sense of durability. A developer reading self._fd.flush() might assume the data is safe. It isn't until os.fsync(fd.fileno()) returns.
log-structured-merge-tree/lsm.py:_flush — The flush-to-SSTable path where WAL truncation can race against durable SSTable writeslog-structured-hash-table/bitcask.py:_recover — How index rebuild on startup silently drops records that never made it to diskwrite-ahead-log/wal.py — The standalone WAL with proper fsync, batch sync, and rotation — the reference for how the other implementations should workb-tree-storage-engine/btree.py:WAL.commit — Correct commit protocol: fsync the data file, then truncate the WAL — compare with the LSM tree's orderingfsync-vs-flush-semantics — Understanding the OS page cache layer between flush() and physical disk writes, and how fdatasync differs from fsynclsm-wal-no-fsync — The LSM Tree WAL (lsm.py:26) calls flush() but never os.fsync(), making all WAL entries vulnerable to loss on power failure despite appearing persistedlog-hash-table-no-durability-guarantee — The log-structured hash table (log-structured-hash-table/bitcask.py:155) provides no durability guarantee: no WAL and no fsync means successful put() calls can silently lose data on crashsstable-wal-truncation-race — The LSM _flush() truncates the WAL after writing an SSTable but neither operation is fsynced, creating a window where both the WAL and SSTable data can be lost simultaneouslybtree-wal-is-reference-implementation — The B-Tree WAL (btree.py:107-108) is the only storage engine that correctly fsyncs both WAL entries and data pages, making it the only implementation with true crash safetybitcask-sync-writes-default-true — The hash-index Bitcask (hash-index-storage/bitcask.py:33) defaults sync_writes=True and conditionally fsyncs, making durability configurable — unlike the log-structured variant which has no fsync path at all