Date: 2026-05-29
Time: 08:39
lcscompact — Leveled Compaction Strategylcscompact executes one round of leveled compaction (LCS), the compaction strategy used by systems like LevelDB and RocksDB. It maintains the core LCS invariant: within each level (L1+), SSTable key ranges don't overlap, and each level has a size budget that grows exponentially. When a level exceeds its budget, one SSTable is pushed down to the next level by merging it with all overlapping SSTables there.
It exists as the counterpart to stcscompact (size-tiered), giving CompactionManager two pluggable compaction strategies behind a common interface.
Preconditions:
self._sstables contains valid SSTableReader instances with .level set correctly.self.strategy == "leveled" (caller dispatches via runcompaction).Postconditions:
self._sstables is mutated in place: the input SSTables in the merge set are removed and replaced by a single merged SSTable at the next level..level is set to the target level.Invariant maintained: After an L0 compaction, all L0 SSTables are consumed. After an L(N) compaction, the selected SSTable and its overlapping neighbors in L(N+1) are replaced by a single SSTable at L(N+1).
None beyond self. All configuration comes from instance attributes:
| Attribute | Role | Default |
|-----------|------|---------|
| l0trigger | Number of L0 SSTables that triggers L0→L1 compaction | 4 |
| maxlevels | Upper bound on level iteration (exclusive) | 7 |
| levelbase_size | Size budget for L1 in bytes | 10 MB |
| _fanout | Multiplier per level (L(N) budget = base × fanout^(N-1)) | 10 |
List[SSTableReader] — the newly created SSTables from this compaction round.
The caller (run_compaction) returns this directly. No error signaling via return value.
The method has two distinct phases, tried in order:
L0 is special — SSTables there can have overlapping key ranges (they come straight from memtable flushes).
1. Check if len(L0) >= l0_trigger. If not, skip to Phase 2.
2. Collect all L0 SSTables — every one participates.
3. For each L0 SSTable, find all L1 SSTables whose key ranges overlap with it (via _overlapping). Accumulate these by id() into a set to deduplicate (one L1 SSTable may overlap multiple L0 SSTables).
4. Build the merge set: all of L0 + the overlapping subset of L1.
5. K-way merge them into a single new SSTable, assign it level = 1.
6. Remove all input SSTables from self._sstables, add the merged one.
7. Return immediately — only one compaction per call.
Iterate levels bottom-up:
1. Sum the file sizes of all SSTables at this level.
2. Compare against levelmax_size(lvl) (= base × fanout^(lvl-1)).
3. If over budget, pick the SSTable with the most overlap with the next level. This is a heuristic — by picking the most-overlapping SSTable, the compaction rewrites the maximum amount of data that's already "in the way," reducing future compaction I/O.
4. Find all SSTables in lvl+1 that overlap with the chosen one.
5. Merge them into a single SSTable at lvl+1.
6. Remove inputs, add output, return immediately.
If no level is over budget, return [].
self._sstables: removes consumed SSTables, appends the merged one.merge_sstables writes a new SSTable file. Old files are not deleted from disk — only removed from the in-memory list. This is a potential resource leak if the caller doesn't handle cleanup.self.counter incremented via next_path() for unique file naming..level on the new SSTableReader — mutating a field that isn't part of the constructor.Essentially none. The method assumes:
self._sstables is consistent (no duplicates, all readers valid).mergesstables succeeds. A disk-full or I/O error would propagate as an unhandled exception, leaving self.sstables in a partially modified state (some SSTables already removed but no merged one added).max() on level_ssts won't receive an empty sequence — this is safe because the total > threshold check implies at least one SSTable exists, but worth noting.
manager = CompactionManager(data_dir, strategy="leveled")
manager.add_sstable(flushed_table)
while manager.needs_compaction():
new_tables = manager.run_compaction() # dispatches to _lcs_compact
The caller is expected to call run_compaction in a loop because each invocation handles at most one level. If L0 triggers compaction and the result pushes L1 over budget, a second call is needed to cascade L1→L2.
| Dependency | Purpose |
|------------|---------|
| merge_sstables() | K-way merge of multiple SSTables into one, deduplicating by key (newest timestamp wins) |
| SSTableReader | The .level attribute, .metadata() for key ranges and file sizes, .scan() used internally by merge |
| _overlapping() | Key-range intersection test between one SSTable and a list of candidates |
| getlevels() | Groups self._sstables by .level into a dict |
| levelmax_size() | Computes the size budget for a given level |
| nextpath() | Generates a unique file path for the compaction output |