Function: CompactionManager in sstable-and-compaction/sstable.py

Date: 2026-05-29

Time: 08:58

CompactionManager

Purpose

CompactionManager is the orchestrator that decides when and how to merge SSTables together to reclaim space and bound read amplification. It implements two classic compaction strategies from Chapter 3 of DDIA:

The class exists to decouple the compaction policy from the SSTable storage format itself (SSTableWriter/SSTableReader).

Contract

Preconditions:

Postconditions:

Invariants:

Parameters

| Parameter | Type | Default | Meaning |

|-----------|------|---------|---------|

| data_dir | str | required | Directory where merged SSTable files are written |

| strategy | str | "sizetiered" | Either "sizetiered" or anything else (treated as leveled) |

| min_threshold | int | 4 | STCS: minimum SSTables in a size bucket to trigger compaction |

| size_thresholds | List[int] | [1M, 10M, 100M] | STCS: byte boundaries between tiers — files <1MB are tier 0, <10MB tier 1, etc. |

| l0compactiontrigger | int | 4 | LCS: number of L0 SSTables that triggers flush to L1 |

| levelbasesize | int | 10000000 | LCS: max total bytes for level 1 |

| fanout | int | 10 | LCS: size multiplier between levels (levelnmax = base * fanout^(n-1)) |

| max_levels | int | 7 | LCS: highest level to consider |

Edge case: strategy is compared with == against "size_tiered" — any other string silently selects leveled compaction. There's no validation.

Return Value

run_compaction() returns List[SSTableReader] — the newly created merged SSTables. Returns an empty list if no compaction was needed. The caller can use this to know what was produced, but the manager has already updated its internal state.

needs_compaction() returns bool — a pure check with no side effects.

Algorithm

Size-Tiered Compaction

1. Bucket each SSTable by file size into tiers using sizethresholds. A 500KB file goes to tier 0, a 5MB file to tier 1, etc.

2. Check if any bucket has >= min_threshold SSTables.

3. Merge all SSTables in each qualifying bucket into a single new SSTable via merge_sstables (k-way merge by key, newest timestamp wins on duplicates).

4. Remove the inputs from _sstables, append the output. Note: STCS processes all qualifying buckets in one call, unlike LCS.

Leveled Compaction

Two distinct paths:

L0 → L1 compaction (takes priority):

1. When >= l0_trigger SSTables accumulate in L0, collect them all.

2. Find every L1 SSTable whose key range overlaps with any L0 SSTable.

3. Merge the entire set (all L0 + overlapping L1) into a single SSTable assigned to level 1.

Level N → N+1 compaction:

1. Walk levels 1 through maxlevels - 1. Find the first level where total file size exceeds levelbase_size * fanout^(level-1).

2. Pick the SSTable in that level with the most overlap with the next level (maximizing the number of overlapping SSTables — this is a heuristic to reduce future compaction work).

3. Merge that SSTable plus all overlapping SSTables from level N+1 into a single new SSTable assigned to level N+1.

4. Return immediately — only one level is compacted per call.

Side Effects

Error Handling

There is essentially none. The class relies on:

Usage Patterns

Typical usage in a write-ahead-log + memtable system:


mgr = CompactionManager("/data/sst", strategy="size_tiered", min_threshold=4)

# After flushing a memtable to disk:
reader = SSTableReader("/data/sst/new_table.sst")
mgr.add_sstable(reader)

# Periodically or after flush:
while mgr.needs_compaction():
    new_tables = mgr.run_compaction()
    # Optionally: delete old files, update manifest, etc.

Caller obligations:

Dependencies

| Dependency | Usage |

|-----------|-------|

| merge_sstables (same module) | K-way merge of multiple SSTables into one |

| SSTableReader (same module) | Read access to SSTable files, provides metadata(), scan(), and .level |

| SSTableWriter (via merge_sstables) | Creates the output SSTable |

| os.path.join | Path construction for output files |

| time.time | Timestamp in output filenames (collision avoidance) |

Topics to Explore

Beliefs