Topic: RocksDB's BottommostLevelCompaction enum and CompactionJob::isbottommostlevel flag show the production implementation of this concept

Date: 2026-05-29

Time: 09:02

Bottommost-Level Compaction: From Reference Implementation to Production

The Concept in This Codebase

The observations are insufficient to directly explain RocksDB's BottommostLevelCompaction enum or CompactionJob::isbottommostlevel flag — this codebase contains no RocksDB code, and the grep for bottom returned zero relevant hits. However, the reference implementations here demonstrate the *underlying problem* that those RocksDB constructs solve.

Where the Concept Surfaces Implicitly

The key behavior lives in the compaction logic across two files:

log-structured-merge-tree/lsm.py:340 — During compaction, tombstones are removed unconditionally:


# Remove tombstones during compaction

This is the simple case: the LSM tree has a flat list of SSTables (self._sstables), and compact() merges *all* of them into one. Since there's nothing beneath the merged output, it's always safe to drop tombstones. Every compaction is implicitly a "bottommost level" compaction.

sstable-and-compaction/sstable.py — The CompactionManager introduces *leveled compaction* (line 1, lines 108–120 in tests), where SSTables are organized into levels. The test at testsstable.py:117 shows runcompaction() promoting L0 SSTables to level 1 (result[0].level == 1). The mergesstables function at testsstable.py:43 accepts a remove_tombstones=True parameter — an explicit choice about whether tombstones survive the merge.

Why This Matters: The Problem RocksDB's Flags Solve

In a leveled compaction scheme, you cannot safely remove tombstones during compaction *unless* you're compacting at the bottommost level. If you drop a tombstone at level 1, an older copy of that key might still exist at level 2 — and with the tombstone gone, the deleted key silently reappears.

This reference implementation sidesteps the problem in two ways:

1. lsm.py has no levels — compact() (triggered at lsm.py:316 when len(self.sstables) >= self.compaction_threshold) merges everything, so tombstone removal is always safe.

2. sstable.py exposes removetombstones as a caller-controlled flag rather than deriving it from level metadata. The test at testsstable.py:43 passes remove_tombstones=True explicitly during merge — there's no automatic reasoning about whether deeper levels might hold stale data.

In RocksDB, BottommostLevelCompaction is an enum that controls *when* the engine bothers compacting the deepest level (since it's expensive and only needed for space reclamation), and CompactionJob::isbottommostlevel is a runtime flag that tells the compaction job "you're at the bottom — it's safe to drop tombstones and perform other cleanup." These two constructs automate what this codebase handles manually or avoids entirely.

The Gap

The missing piece in these implementations is level-aware tombstone safety. Neither implementation tracks whether a compaction output sits above other data that might contradict tombstone removal. A production system like RocksDB must answer: "Does any level below me contain keys that overlap with this compaction's key range?" If yes, tombstones must be preserved. The isbottommostlevel flag encodes exactly that answer.

Beliefs