Function: stcscompact in sstable-and-compaction/sstable.py

Date: 2026-05-29

Time: 12:35

stcscompact — Size-Tiered Compaction Strategy (STCS)

Purpose

This is the core compaction routine for the size-tiered compaction strategy. It finds groups of similarly-sized SSTables that have accumulated past a threshold, merges each group into a single larger SSTable, and updates the manager's SSTable list. The goal: reduce the number of SSTables the system must search during reads, while grouping files by size so that small, recently-flushed tables merge with peers rather than being immediately folded into a massive file.

Contract

Preconditions:

Postconditions:

Invariant:

Parameters

None beyond self. All configuration comes from instance state:

| Field | Role |

|-------|------|

| self._sstables | The live set of SSTables managed by this instance |

| self.minthreshold | Minimum bucket size to trigger a merge (default 4) |

| self.sizethresholds | Tier boundaries used by stcsbuckets()gettier() |

Return Value

List[SSTableReader] — the newly created merged SSTables, one per compacted bucket. Empty list if no bucket met the threshold. The caller doesn't need to register these; the method already appends them to self._sstables.

Algorithm

1. Bucket SSTables by size tier. Calls stcsbuckets(), which groups every SSTable by its tier index (determined by comparing filesize against self.size_thresholds). Tier 0 is the smallest files, tier N is the largest.

2. Iterate each bucket. For each (tier, bucket) pair:

3. Register new SSTables. Extend self._sstables with the merged results.

4. Return the list of newly created SSTables.

Side Effects

Error Handling

Essentially none. Several failure modes propagate as unhandled exceptions:

Usage Patterns

Called exclusively through runcompaction() when the strategy is "sizetiered":


mgr = CompactionManager(data_dir, strategy="size_tiered", min_threshold=4)
# ... add SSTables over time ...
if mgr.needs_compaction():
    new = mgr.run_compaction()

Typically invoked after memtable flushes — each flush produces a small L0 SSTable, and once enough accumulate in the same size tier, they get merged. The merged result is larger, so it lands in a higher tier, where it eventually merges with other similarly-sized files. This creates a natural hierarchy of increasingly large SSTables over time.

Dependencies

| Dependency | Role |

|------------|------|

| stcsbuckets() / gettier() | Classify SSTables into size tiers |

| merge_sstables() | K-way sorted merge with timestamp-based deduplication |

| nextpath() | Generate unique output file paths |

| SSTableReader | Returned by merge_sstables, appended to the managed list |

Assumptions Not Enforced by Types

1. self._sstables contains no duplicates. list.remove() removes the first match — if an SSTable appears twice, only one copy is removed, leaving the list inconsistent.

2. SSTables are internally sorted. mergesstables assumes each input yields entries in key order. Nothing prevents adding an unsorted SSTable via addsstable().

3. Old files are cleaned up externally. The method orphans input files on disk.

4. Concurrent access is unsafe. The method mutates self.sstables while iterating buckets derived from it. Safe only because stcsbuckets() creates a snapshot (new dict of lists), but concurrent calls to addsstable() during compaction could corrupt state.

Beliefs