Date: 2026-05-28
Time: 19:01
merge_sstables — K-Way Merge of Sorted String Tablesmerge_sstables implements the core operation behind SSTable compaction: merging multiple sorted, immutable SSTable files into a single new SSTable. This is the fundamental building block that both size-tiered (STCS) and leveled (LCS) compaction strategies delegate to.
It exists because LSM-tree storage engines accumulate SSTables over time. Reads degrade as the number of SSTables grows (each must be checked), so compaction periodically merges them to reduce the count and reclaim space from obsolete or deleted entries.
Preconditions:
SSTableReader in readers must contain entries sorted by key (this is an SSTable invariant, not checked here).output_path must be a writable path; the file will be created or overwritten.output_path (merging a file into itself would be undefined).Postconditions:
remove_tombstones=True, entries where value is None (deletes) are dropped entirely.Invariants during execution:
last_key tracks the most recently written key to deduplicate.| Parameter | Type | Description |
|-----------|------|-------------|
| readers | List[SSTableReader] | The SSTables to merge. Can be empty (produces an empty output). Order doesn't matter — the heap handles sorting. |
| output_path | str | Filesystem path for the merged SSTable. Parent directory must exist. |
| remove_tombstones | bool | When True, tombstones (deletion markers with value=None) are excluded from the output. Only safe during compaction when no older SSTables reference the same keys — otherwise the tombstone would be lost and the deleted key would "reappear." Defaults to False. |
Returns an SSTableReader opened on the newly written file. The caller can use it immediately for reads. The caller is responsible for:
CompactionManager._sstables list.The algorithm is a k-way merge using Python's heapq min-heap:
1. Initialization — Seed the heap with the first entry from each reader's scan() iterator. Each heap element is a 5-tuple:
(key, -timestamp, source_index, entry, iterator)
key is the primary sort key (lexicographic, ascending).-timestamp is negated so that among entries with the same key, the newest sorts first (smallest negative = largest original timestamp).source_index is a tiebreaker to avoid comparing SSTableEntry objects (which aren't orderable). Without this, two entries with the same (key, -timestamp) would cause a TypeError.2. Main loop — Pop the smallest element from the heap:
key == last_key): This entry has the same key as the one we just wrote but a lower timestamp (or same timestamp from a different source). Skip it — the winning entry was already emitted. Advance that source's iterator.remove_tombstones and entry.value is None): Skip deleted entries. Advance the iterator.After processing each entry, pushnext pulls the next entry from that source's iterator into the heap, maintaining the invariant of one heap entry per active source.
3. Finalization — Call writer.finish() to write the sparse index and footer, then return a reader on the new file.
Key insight: Because each SSTable is already sorted by key, and the heap always yields the globally smallest (key, -timestamp), we get a correct merge in O(N log K) time where N is total entries and K is the number of input SSTables.
output_path. Each input SSTable is opened for reading via scan().CompactionManager.stcscompact and lcscompact) handles removing old readers from the active list, but notably does not delete the old files from disk either.This function does not catch or handle errors explicitly:
outputpath is not writable, SSTableWriter.init_ raises OSError/PermissionError. The file may be left in a partial state.readentry inside scan() may return None early (truncating that source) or raise struct.error / UnicodeDecodeError.StopIteration from exhausted iterators is caught silently (correct behavior — an empty SSTable simply contributes nothing).Called exclusively by CompactionManager:
# Size-tiered: merge a bucket of similarly-sized SSTables
merged = merge_sstables(bucket, self._next_path())
# Leveled: merge one SSTable with its overlapping neighbors in the next level
merged = merge_sstables(merge_set, self._next_path())
merged.level = lvl + 1
The caller always sets .level on the returned reader afterward (for leveled compaction). merge_sstables itself doesn't concern itself with levels.
Neither call site passes remove_tombstones=True, so tombstones are currently preserved through compaction. In a production system, tombstones would be removed during major compaction (when merging the oldest level) but retained otherwise to prevent ghost reads from older SSTables.
heapq (stdlib) — min-heap for k-way merge.SSTableWriter — writes the merged output in the binary SSTable format.SSTableReader — reads each input via scan() and wraps the output file.SSTableEntry — the dataclass flowing through the merge; must have .key, .value, .timestamp.