Date: 2026-05-29
Time: 13:45
The short answer: this codebase does not implement LevelDB's Version reference counting. The observations are insufficient to explain that mechanism from code. Here's what's present, what's missing, and why it matters.
The LSM tree implementation in log-structured-merge-tree/lsm.py has a working compaction path. At line 319, compact() merges all SSTables into one, and at line 316-317, compaction triggers automatically when the SSTable count exceeds a threshold:
if len(self._sstables) >= self._compaction_threshold:
self.compact()
The SSTable class (line 69 of lsm.py) reads from files on disk with a sparse index, and compaction produces a new merged SSTable. But there is no concurrency control around the SSTable list. When compact() runs, it mutates self._sstables directly — any concurrent reader iterating that same list would see inconsistent state.
In real LevelDB, this problem is solved by a Version object — an immutable snapshot of "which SSTable files exist at each level." The key pieces absent from this codebase:
1. No Version class — The grep for class.*Version found only VersionedValue (dynamo.py:13, vectorclock.py:82) and Version (mvccdatabase.py:9), none of which represent an SSTable manifest.
2. No reference counting — The grep for refcount|refcount|refs|retain|release found only lock-related release() methods (fencing_tokens.py:38, lamport.py:153), not lifecycle management for read snapshots.
3. No VersionSet or current pointer — The grep for currentversion|snapshot|version found MVCC snapshots (ssidatabase.py:75) and dynamo versioning (dynamo.py:82), but nothing that tracks "which set of SSTables is the current view."
In LevelDB's design, the pattern works like this:
ref_count.current_ pointer to the latest Version.Get() or iterator, it calls Ref() on the current Version, incrementing the count.Unref(), decrementing. If the count hits zero, the Version (and its SSTable files) can be deleted.current_. The old Version stays alive as long as any reader holds a reference.This is why it matters for read consistency: a reader that started before compaction continues to see the old set of SSTables — files aren't deleted out from under it. Without this, you'd get one of two problems:
1. File-not-found errors — compaction deletes an SSTable that a concurrent reader is scanning
2. Inconsistent reads — a reader sees a mix of pre- and post-compaction state
The current lsm.py implementation at line 319 simply does self.sstables = [merged], which is fine for single-threaded use (as the tests at testlsm.py demonstrate) but would break under concurrency.
Interestingly, snapshot-isolation/mvccdatabase.py:9 has a Version class, and write-skew-detection/ssidatabase.py:59-67 implements multi-version visibility rules — the same *idea* (readers see a consistent point-in-time view) applied at the row level rather than the file level. LevelDB's Version refcounting is MVCC for the storage engine's metadata.