Function: lcscompact in sstable-and-compaction/sstable.py

Date: 2026-05-29

Time: 08:39

lcscompact — Leveled Compaction Strategy

Purpose

lcscompact executes one round of leveled compaction (LCS), the compaction strategy used by systems like LevelDB and RocksDB. It maintains the core LCS invariant: within each level (L1+), SSTable key ranges don't overlap, and each level has a size budget that grows exponentially. When a level exceeds its budget, one SSTable is pushed down to the next level by merging it with all overlapping SSTables there.

It exists as the counterpart to stcscompact (size-tiered), giving CompactionManager two pluggable compaction strategies behind a common interface.

Contract

Preconditions:

Postconditions:

Invariant maintained: After an L0 compaction, all L0 SSTables are consumed. After an L(N) compaction, the selected SSTable and its overlapping neighbors in L(N+1) are replaced by a single SSTable at L(N+1).

Parameters

None beyond self. All configuration comes from instance attributes:

| Attribute | Role | Default |

|-----------|------|---------|

| l0trigger | Number of L0 SSTables that triggers L0→L1 compaction | 4 |

| maxlevels | Upper bound on level iteration (exclusive) | 7 |

| levelbase_size | Size budget for L1 in bytes | 10 MB |

| _fanout | Multiplier per level (L(N) budget = base × fanout^(N-1)) | 10 |

Return Value

List[SSTableReader] — the newly created SSTables from this compaction round.

The caller (run_compaction) returns this directly. No error signaling via return value.

Algorithm

The method has two distinct phases, tried in order:

Phase 1: L0 Compaction

L0 is special — SSTables there can have overlapping key ranges (they come straight from memtable flushes).

1. Check if len(L0) >= l0_trigger. If not, skip to Phase 2.

2. Collect all L0 SSTables — every one participates.

3. For each L0 SSTable, find all L1 SSTables whose key ranges overlap with it (via _overlapping). Accumulate these by id() into a set to deduplicate (one L1 SSTable may overlap multiple L0 SSTables).

4. Build the merge set: all of L0 + the overlapping subset of L1.

5. K-way merge them into a single new SSTable, assign it level = 1.

6. Remove all input SSTables from self._sstables, add the merged one.

7. Return immediately — only one compaction per call.

Phase 2: Level N Compaction (L1 through L_max-1)

Iterate levels bottom-up:

1. Sum the file sizes of all SSTables at this level.

2. Compare against levelmax_size(lvl) (= base × fanout^(lvl-1)).

3. If over budget, pick the SSTable with the most overlap with the next level. This is a heuristic — by picking the most-overlapping SSTable, the compaction rewrites the maximum amount of data that's already "in the way," reducing future compaction I/O.

4. Find all SSTables in lvl+1 that overlap with the chosen one.

5. Merge them into a single SSTable at lvl+1.

6. Remove inputs, add output, return immediately.

If no level is over budget, return [].

Side Effects

Error Handling

Essentially none. The method assumes:

Usage Patterns


manager = CompactionManager(data_dir, strategy="leveled")
manager.add_sstable(flushed_table)

while manager.needs_compaction():
    new_tables = manager.run_compaction()  # dispatches to _lcs_compact

The caller is expected to call run_compaction in a loop because each invocation handles at most one level. If L0 triggers compaction and the result pushes L1 over budget, a second call is needed to cascade L1→L2.

Dependencies

| Dependency | Purpose |

|------------|---------|

| merge_sstables() | K-way merge of multiple SSTables into one, deduplicating by key (newest timestamp wins) |

| SSTableReader | The .level attribute, .metadata() for key ranges and file sizes, .scan() used internally by merge |

| _overlapping() | Key-range intersection test between one SSTable and a list of candidates |

| getlevels() | Groups self._sstables by .level into a dict |

| levelmax_size() | Computes the size budget for a given level |

| nextpath() | Generates a unique file path for the compaction output |