Function: _overlapping in sstable-and-compaction/sstable.py

Date: 2026-05-29

Time: 12:36

CompactionManager._overlapping

Purpose

This method finds which SSTables from a candidate set have key ranges that overlap with a given SSTable. It's the core spatial query behind leveled compaction — when compacting an SSTable from level N into level N+1, you need to know which level N+1 SSTables share key space with it, because those are the ones that must participate in the merge to maintain the leveled compaction invariant (no overlapping key ranges within a single level).

Contract

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| sst | SSTableReader | The SSTable whose key range defines the "query window." In practice, this is the SSTable being promoted from level N. |

| candidates | List[SSTableReader] | SSTables to test against. In practice, these are all SSTables at the target level (N+1). |

Edge cases: If candidates is empty, returns []. If sst contains a single key (minkey == maxkey), the test still works correctly — it finds any candidate whose range contains that key.

Return Value

A List[SSTableReader] — the subset of candidates that overlap. Can be empty (no overlap), or equal to the entire candidates list (full overlap). The caller uses this to build a merge set for compaction (lcscompact).

Algorithm

The overlap check uses the standard interval overlap test. Two ranges [Amin, Amax] and [Bmin, Bmax] overlap if and only if Amin <= Bmax AND Amax >= Bmin. Equivalently, they do *not* overlap when one is entirely before or after the other.

Step by step:

1. Fetch sst's metadata once (avoiding repeated calls).

2. For each candidate c, fetch its metadata and test: c.minkey <= sst.maxkey AND c.maxkey >= sst.minkey.

3. Collect all candidates that pass the test.

The comparison uses Python's default string ordering (lexicographic/byte-order), which matches the sorted order used by SSTableWriter.add.

Side Effects

None. This is a pure query — no I/O, no mutation, no state changes. metadata() constructs a new SSTableMetadata dataclass from cached fields on each call, but that's allocation-only.

Error Handling

No explicit error handling. If a reader is in a bad state (e.g., corrupted file that failed at construction), the error would have been raised during SSTableReader._init_, not here. The method assumes all readers were constructed successfully.

Usage Patterns

Called in two contexts within lcscompact:

1. L0 compaction: For each L0 SSTable, find which L1 SSTables overlap. L0 SSTables can have overlapping ranges with each other (that's why L0 exists), so this is called once per L0 SSTable.

2. Level N compaction: First used to *pick* which SSTable to compact (the one with the most overlapping SSTables in level N+1 — a heuristic to reduce future compaction work). Then called again to get the actual merge set.


# Picking the best candidate to compact
best = max(level_ssts, key=lambda s: len(self._overlapping(s, next_level)))
# Getting its merge partners
overlapping_next = self._overlapping(best, next_level)

Note the double call for best — once inside max and once after. This is O(n*m) where n = level size, m = next level size.

Dependencies

Assumptions Not Enforced by Types

1. Key ordering is consistent: The method assumes lexicographic string comparison matches the sort order used when building SSTables. If keys were sorted with a custom comparator, this overlap test would give wrong results.

2. Metadata reflects actual content: minkey and maxkey are derived during SSTableReader._init_ by scanning entries. If the file were truncated or corrupted after the reader was constructed (e.g., by a concurrent process), the metadata would be stale — but this is an unusual failure mode.

3. Keys are non-empty: The SSTableReader defaults minkey and maxkey to "" for empty SSTables. An empty SSTable passed as sst would match any candidate whose min_key is "", which could produce spurious overlaps. In practice this doesn't happen because empty SSTables aren't fed into compaction.

Topics to Explore

Beliefs