Function: range_scan in log-structured-merge-tree/lsm.py

Date: 2026-05-28

Time: 18:26

LSMTree.range_scan

Purpose

range_scan performs a multi-source merge query across all layers of the LSM tree — active memtable, immutable memtables, and on-disk SSTables — to return a consistent, deduplicated snapshot of key-value pairs within a key range. It exists because an LSM tree scatters data across multiple structures at different ages, and a range query must reconcile all of them, respecting write ordering so that newer writes shadow older ones and tombstones suppress deleted keys.

Contract

Preconditions:

Postconditions:

Invariants relied upon:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| start_key | str | Inclusive lower bound of the scan range. Must be a valid string comparable to stored keys via Python's default string ordering. |

| endkey | str | Exclusive upper bound. Keys equal to endkey are not returned. |

Edge cases: If startkey == endkey, the range is empty. If no keys fall within the range across any layer, the result is [].

Return Value

List[Tuple[str, str]] — a list of (key, value) pairs sorted by key. Values are decoded from bytes to UTF-8 strings. The caller receives a complete snapshot; there is no cursor or iterator to manage.

If every matching key has been deleted (tombstoned), the result is an empty list — the caller cannot distinguish "no keys in range" from "all keys deleted."

Algorithm

The method uses a priority-based last-writer-wins merge:

1. Initialize a merged dict mapping each key to (priority, value_bytes). Priority is an integer counter incremented per source, oldest sources first.

2. Scan SSTables (oldest → newest). For each SSTable, call sst.scan(startkey, endkey) which yields (key, valuebytes) for keys in [startkey, end_key). Each result overwrites any prior entry in merged for that key, because the priority is higher. This is the critical insight: since sources are visited oldest-first and merged[k] = (priority, v) is a plain overwrite, the last write to any key naturally wins.

3. Scan immutable memtables (oldest → newest). Uses SortedDict.irange() with bounds (True, False) meaning inclusive-start, exclusive-end. Same overwrite logic.

4. Scan the active memtable (newest of all). Same irange pattern, highest priority.

5. Build the result list. Iterate merged in sorted key order. Skip any entry whose value is TOMBSTONE (the empty bytes sentinel b""). Decode surviving values to UTF-8 strings.

The priority value is tracked but only the *value* paired with it matters — the priority field itself is never compared or used for tie-breaking. Its only role is documentation: the overwrite order guarantees correctness, and the priority integer is vestigial (a leftover from a design that might have used max() instead of overwrite).

Side Effects

None. This is a pure read operation. It does not modify the memtable, WAL, SSTables, or any tree state. SSTable scans open and close file handles internally (sst.scan uses open within a with block).

Error Handling

No explicit error handling. Possible exceptions:

Usage Patterns


tree = LSMTree("/tmp/data")
tree.put("user:001", "alice")
tree.put("user:002", "bob")
tree.put("user:003", "charlie")
results = tree.range_scan("user:001", "user:003")
# [("user:001", "alice"), ("user:002", "bob")]

Callers use this for prefix scans (e.g., all keys starting with "user:") by computing an appropriate end_key. The caller must understand that the result is a point-in-time snapshot with no isolation guarantees — concurrent writes may or may not be visible.

Dependencies