Function: scanrangeforkey in log-structured-merge-tree/lsm.py

Date: 2026-05-29

Time: 13:18

SSTable.scanrangeforkey

Purpose

This is the low-level linear scan that resolves a key lookup within a bounded region of an SSTable file. The sparse index narrows a lookup to a small byte range (typically covering ~16 entries), and scanrangeforkey does the final sequential walk through that range to find (or rule out) the target key.

It exists because SSTables use a sparse index — not every key has an index entry. The sparse index gets you close; this method finishes the job.

Contract

Preconditions:

Postconditions:

Invariant: The file is not modified. The method is read-only and stateless.

Parameters

| Parameter | Type | Meaning |

|-----------|------|---------|

| key | str | The key to search for |

| start | int | Byte offset where scanning begins (inclusive) |

| end | int | Byte offset where scanning stops (exclusive) — the cursor must stay below this |

start is typically from the sparse index entry. end is either the next sparse index entry's offset or the footer start offset (for the last segment).

Return Value

Tuple[bool, Optional[bytes]]

The caller does not need to handle exceptions from this method under normal operation — malformed data causes readentry to return None, which triggers a clean exit.

Algorithm

1. Open the file at self.path in binary read mode.

2. Seek to start.

3. Loop while the file cursor is before end:

4. Fall through to return (False, None).

The early-exit on k > key turns worst-case from O(interval) to average O(interval/2) for misses, and makes cache-unfriendly sequential I/O shorter.

Side Effects

Error Handling

There is no explicit exception handling. The method relies on:

If the file is missing or unreadable, open() raises FileNotFoundError or OSError, which propagates to the caller unhandled.

Assumption not enforced by the type system: start and end must be valid byte offsets pointing to entry boundaries. Arbitrary offsets will cause readentry to parse garbage, likely returning None and silently reporting the key as not found.

Usage Patterns

Called exclusively by SSTable.get, which computes start and end from the sparse index:


def get(self, key):
    # ... bisect sparse index to find idx ...
    start_off = self._sparse_index[idx][1]
    end_off = self._sparse_index[idx + 1][1] if idx + 1 < len(...) else self._footer_start()
    return self._scan_range_for_key(key, start_off, end_off)

The caller never needs to validate the return — the tuple is always well-formed.

Dependencies

Beliefs