Topic: Audit which implementations call fsync after compaction writes vs only on normal writes, since this determines whether crash recovery is actually safe or just lucky

Date: 2026-05-29

Time: 08:41

Fsync Audit: Compaction Writes vs Normal Writes

This audit determines which implementations ensure compaction output is durable on disk (via fsync) versus merely flushed to OS buffers, and whether crash recovery guarantees are real or accidental.

Summary Table

| Implementation | Normal writes fsync? | Compaction fsync? | Crash-safe? |

|---|---|---|---|

| write-ahead-log/wal.py | Yes | N/A (no compaction) | Yes |

| b-tree-storage-engine/btree.py | Via WAL | N/A (page-based, no compaction) | Yes |

| hash-index-storage/bitcask.py | Conditional (sync_writes) | Unknown (code cut off) | Maybe |

| log-structured-merge-tree/lsm.py | No — flush only | No — flush only | No |

| log-structured-hash-table/bitcask.py | No — flush only | Unknown (code cut off) | No |

| sstable-and-compaction/sstable.py | No | Unknown (code cut off) | No |

Detailed Findings

write-ahead-log/wal.py — Properly Durable

This is the gold standard in the codebase. Every durability-critical path pairs flush() with os.fsync():

The WAL implementation understands that flush() alone only moves data from Python's userspace buffer to the OS page cache — it does not guarantee the data reaches stable storage. The consistent flush()os.fsync() pattern ensures actual durability.

b-tree-storage-engine/btree.py — Safe via WAL Protocol

The B-tree uses a two-phase approach:

The safety comes from the WAL:

1. WAL.log_write() (line ~140) writes page data to the WAL and fsyncs it

2. WAL.commit() (line ~147) calls page_manager.sync() (which fsyncs the data file), then truncates the WAL and fsyncs that too

3. WAL.recover() (line ~155) replays logged writes and fsyncs

This is correct: individual page writes don't need to be durable because the WAL can replay them after a crash. The commit() call is the durability barrier.

log-structured-merge-tree/lsm.py — Not Durable at All

This is the most concerning finding. The LSM tree's internal WAL (class WAL, line 13) only calls self._fd.flush() at line 26 — never os.fsync(). Grep confirms zero os.fsync calls anywhere in this file.

This means a crash at any point can lose data. The WAL is supposed to be the recovery mechanism, but since it never fsyncs, a power failure can lose WAL entries that were "flushed" to OS buffers but never reached disk. Compaction is doubly dangerous: if the process crashes mid-compaction, the new merged SSTable may be partially written (or entirely in page cache), old SSTables may already be deleted, and the WAL has already been truncated.

Verdict: crash recovery here is purely lucky — it works only because the OS eventually flushes dirty pages, and the tests never simulate actual power loss.

hash-index-storage/bitcask.py — Conditionally Safe for Writes, Unknown for Compaction

Normal writes in writerecord() (line ~97-99):


self.active_file.flush()
if self.sync_writes:
    os.fsync(self.active_file.fileno())

The syncwrites flag (defaulting to True in init_, line 31) makes normal writes durable. But the compact() method starts at line ~213 and was cut off in the observations — we cannot confirm whether compaction output is fsynced.

log-structured-hash-table/bitcask.py — Not Durable

writerecord() (line ~157) calls only self.activefile.flush() — no fsync, no conditional sync option. The compaction code was cut off in observations, but there are zero os.fsync calls in the grep results for this file.

sstable-and-compaction/sstable.py — Not Durable

SSTableWriter.finish() (line ~89) closes the file after writing the index and footer, but never calls flush() or os.fsync(). The file close will flush Python buffers but does not guarantee data reaches disk. Compaction strategy code was cut off at line 200.

The Core Problem

The dangerous pattern is:

1. Write new compacted file (data in OS page cache only)

2. Delete old segment files

3. Crash before OS flushes the new file to disk

4. On recovery: old files gone, new file empty or corrupt

Only hash-index-storage/bitcask.py with sync_writes=True protects normal writes. The write-ahead-log/wal.py and b-tree-storage-engine/btree.py are properly durable. The LSM tree, log-structured hash table, and SSTable implementations are all vulnerable.

Observation Gaps

The compaction methods for three implementations (hash-index-storage/bitcask.py:compact, log-structured-hash-table/bitcask.py:compact, sstable-and-compaction/sstable.py compaction strategies) were cut off at line 200. A complete audit requires reading these methods to confirm whether they fsync compaction output before deleting old segments.