VersionSet and VersionEdit to see how production systems track which SSTables exist at which levels with key rangesDate: 2026-05-29
Time: 09:01
In LevelDB, the core challenge is: which SSTable files currently exist, at which levels, covering which key ranges? This is metadata that changes every time a compaction runs (files are created and deleted) or a memtable flushes (a new L0 file appears). LevelDB solves this with three concepts:
1. FileMetaData — per-file record: file number, file size, smallest key, largest key
2. VersionEdit — a delta describing what changed: "add file X to level 2, delete file Y from level 1"
3. VersionSet — the full current state (which files at which levels), reconstructed by replaying a log of VersionEdit records from a MANIFEST file
This is essentially event sourcing for file metadata — the MANIFEST is an append-only log of edits, and the current Version is the materialized view.
The sstable-and-compaction/sstable.py file defines SSTableMetadata (line 34–40):
@dataclass
class SSTableMetadata:
filepath: str
min_key: str
max_key: str
entry_count: int
file_size: int
level: int
timestamp: float
This is the closest analog to LevelDB's FileMetaData. It tracks the key range (minkey, maxkey) and which level the file belongs to. The SSTableWriter.finish() method (line 89–105) produces this metadata when an SSTable is finalized — note that level is hardcoded to 0 at creation time (line 104), which matches LevelDB's behavior where freshly flushed files always land at L0.
The SSTableReader reconstructs this metadata on open by reading the first and last entries to determine minkey and maxkey (lines 142–165). This is a simplification — LevelDB stores these in the MANIFEST rather than re-scanning files.
The test file (sstable-and-compaction/test_sstable.py, lines 108–122) shows leveled compaction in action:
mgr = CompactionManager(lcs_dir, strategy='leveled', l0_compaction_trigger=2)
# ... add 3 L0 SSTables ...
result = mgr.run_compaction()
assert result[0].level == 1 # compacted output promoted to L1
The CompactionManager maintains the current set of SSTables and their levels. When compaction runs, it merges overlapping files and promotes the result to the next level. This is the operational equivalent of a VersionEdit that says "delete these L0 files, add this new L1 file" — but it's done imperatively rather than through a logged edit.
The log-structured-merge-tree/lsm.py implementation takes a simpler approach. Its LSMTree class (line 200+) maintains a flat list of SSTables (self._sstables) with no level concept. Compaction (line 319) merges all SSTables into one, rather than doing level-by-level promotion:
compaction_threshold (line 204, 208) triggers compaction when the SSTable count gets too highif len(self.sstables) >= self.compaction_thresholdcompact() method (line 319) does a single all-to-one mergeThis is closer to a size-tiered strategy with a single tier — there's no VersionSet tracking which files belong to which level because there are no levels.
The key gap is durability of the version state. In LevelDB:
1. A VersionEdit is written to the MANIFEST *before* the compaction output is visible — this is crash-safe
2. On recovery, the VersionSet replays the MANIFEST to reconstruct which files exist at which levels
3. Old Version objects are kept alive via reference counting so concurrent readers see a consistent snapshot (MVCC for file metadata)
Neither implementation here has:
Version objects that let in-progress reads survive a compactionThe sstable-and-compaction module has the right data structures (SSTableMetadata with level, minkey, maxkey) but manages them in-memory only. The lsm.py module doesn't even track levels, using sequence numbers (seq in the SSTable class, line 74) for ordering instead.
LevelDB's VersionSet/VersionEdit pattern exists because SSTable management has the same problem as any database: you need atomic, durable state transitions for metadata. The MANIFEST is essentially a write-ahead log for the file catalog. This codebase correctly implements the data-plane side (SSTable format, sparse indexes, merge logic) but simplifies away the metadata-plane — which is fine for learning the algorithms, but is exactly the gap you'd need to close for a production system.