Date: 2026-05-29
Time: 07:00
The short answer: this codebase does not implement bottommost-level tracking. Both implementations use simplified tombstone removal strategies that sidestep the problem entirely. This is a meaningful gap worth understanding, because it highlights exactly why production systems need the concept.
log-structured-merge-tree/lsm.py takes the simplest approach: its compact() method merges *all* SSTables into a single output file. Since there's nothing below the result, every tombstone is safe to drop. You can see this at lines 295–299 — after merging, it filters out tombstones unconditionally:
# Build result, excluding tombstones (line 295)
...
if v != TOMBSTONE: (line 299)
This works because "merge everything" is trivially the bottommost compaction — there are no older SSTables that might still hold the key below.
sstable-and-compaction/sstable.py has a leveled compaction strategy with maxlevels = 7 (line 309) and iterates levels at lines 386 and 420. However, tombstone removal is controlled by an explicit removetombstones boolean passed to mergesstables (visible in the test at test_sstable.py:44):
merged = merge_sstables([reader2, reader], merged_path, remove_tombstones=True)
The caller decides. There's no logic that inspects whether the compaction output lands at the deepest occupied level.
In LevelDB and RocksDB, the rule is: a tombstone can only be dropped if there is no older copy of that key in any level below the compaction output. The mechanism has three parts:
1. Level metadata tracking: Each SSTable knows its level assignment. The VersionSet / VersionStorageInfo maintains which SSTables exist at each level and their key ranges.
2. Bottommost detection at compaction time: When the compaction picker selects input files, it checks whether any SSTable at a *lower* level overlaps the key range being compacted. If no lower level has overlapping files, this compaction is "bottommost" for those keys.
3. Per-key decision during merge: Even within a "bottommost" compaction, the decision is per-key. A tombstone for key K is only dropped if no SSTable at levels L+1 through Lmax contains K in its key range. RocksDB tracks this with BottommostLevelCompaction options and the isbottommost_level flag on CompactionJob.
If you drop a tombstone too early — say, during a compaction at level 2 while an older PUT for the same key still sits at level 4 — the deleted key *reappears*. The tombstone was the only thing suppressing it. This is a data correctness bug, not a performance issue.
sstable.py: The full CompactionManager leveled compaction logic is cut off. The actual merge/compaction execution and tombstone handling during leveled compaction aren't visible.VersionSet equivalent: There's no manifest or version tracking that maps SSTables to levels with key-range metadata, which is the foundation for the bottommost check in LevelDB/RocksDB.