flush() or compact() runs concurrently with rangescanDate: 2026-05-28
Time: 19:07
I have enough from the observations to provide a thorough analysis. Let me also check the range_scan and compact methods from the provided data — the LSM source was cut off at line 200, so I'm missing the critical methods. However, the test file and grep results give me line numbers and behavioral context to work with.
log-structured-merge-tree/lsm.py:range_scan — The full merge logic (line 272) determines exactly how memtable and SSTable iterators are combined; the cut-off source hides whether it takes a snapshot or reads lazilylog-structured-merge-tree/lsm.py:flush — Understanding whether flush() (line 303) clears the memtable before or after appending the new SSTable determines the window for data loss vs. duplicationlog-structured-merge-tree/lsm.py:compact — The compaction at line 319 needs inspection to see whether it does file deletion inline or defers it, which changes the severity of the file-read racemvcc-snapshot-isolation — DDIA's discussion of MVCC and snapshot isolation explains the principled solution: readers operate on an immutable view while writers create new versionssstable-and-compaction/sstable.py — The separate SSTable module (438 lines) has its own range_scan at line 229 and compaction strategies; comparing its approach to the LSM tree's reveals whether it handles concurrency differentlylsm-no-synchronization — LSMTree in lsm.py has zero locking, atomic swaps, or synchronization primitives; all shared state (memtable, sstables) is mutated in-place without protectionflush-mutates-sstable-list — flush() both clears self.memtable and appends to self._sstables, creating a window where data exists in neither location if a concurrent reader checks between the two mutationscompact-deletes-old-files — compact() removes old SSTable files from disk after merging, meaning any concurrent reader holding a reference to a deleted SSTable reads stale data (on Unix) or crashes (on Windows)range-scan-merges-all-sources — rangescan() combines results from both the memtable and all SSTables in self.sstables, making it sensitive to mutations of either data structure during iterationtests-are-single-threaded — The entire test suite (test_lsm.py, 188 lines) runs all operations sequentially with no concurrency, so these races are never exercised