Date: 2026-05-29
Time: 13:18
SSTable.scanrangeforkeyThis is the low-level linear scan that resolves a key lookup within a bounded region of an SSTable file. The sparse index narrows a lookup to a small byte range (typically covering ~16 entries), and scanrangeforkey does the final sequential walk through that range to find (or rule out) the target key.
It exists because SSTables use a sparse index — not every key has an index entry. The sparse index gets you close; this method finishes the job.
Preconditions:
self.path is a valid SSTable file on diskstart and end are byte offsets that bracket a contiguous run of length-prefixed entries written by SSTable.write[start, end) are sorted by key in ascending lexicographic order (guaranteed by the SSTable write path, which receives entries from a SortedDict)Postconditions:
(True, value_bytes) if the key exists in the scanned range — including tombstones (TOMBSTONE = b"")(False, None) if the key is not in the rangeInvariant: The file is not modified. The method is read-only and stateless.
| Parameter | Type | Meaning |
|-----------|------|---------|
| key | str | The key to search for |
| start | int | Byte offset where scanning begins (inclusive) |
| end | int | Byte offset where scanning stops (exclusive) — the cursor must stay below this |
start is typically from the sparse index entry. end is either the next sparse index entry's offset or the footer start offset (for the last segment).
Tuple[bool, Optional[bytes]]
(True, v) — key found; v is the raw value bytes. The caller (SSTable.get) must check whether v == TOMBSTONE to distinguish a live value from a deletion marker.(False, None) — key not present in this range.The caller does not need to handle exceptions from this method under normal operation — malformed data causes readentry to return None, which triggers a clean exit.
1. Open the file at self.path in binary read mode.
2. Seek to start.
3. Loop while the file cursor is before end:
readentry(f), which decodes a (keylen, key, valuelen, value) record.readentry returns None (truncated/corrupt data), break.key, return (True, value) immediately.key, break early — since entries are sorted, the key cannot appear later. This is the critical optimization: it avoids scanning the remainder of the range.4. Fall through to return (False, None).
The early-exit on k > key turns worst-case from O(interval) to average O(interval/2) for misses, and makes cache-unfriendly sequential I/O shorter.
self.path. The file handle is scoped to the with block and closed on exit.self.There is no explicit exception handling. The method relies on:
readentry returning None for short reads (graceful degradation on truncated files)with block ensuring the file handle is closed even if an unexpected exception propagatesIf the file is missing or unreadable, open() raises FileNotFoundError or OSError, which propagates to the caller unhandled.
Assumption not enforced by the type system: start and end must be valid byte offsets pointing to entry boundaries. Arbitrary offsets will cause readentry to parse garbage, likely returning None and silently reporting the key as not found.
Called exclusively by SSTable.get, which computes start and end from the sparse index:
def get(self, key):
# ... bisect sparse index to find idx ...
start_off = self._sparse_index[idx][1]
end_off = self._sparse_index[idx + 1][1] if idx + 1 < len(...) else self._footer_start()
return self._scan_range_for_key(key, start_off, end_off)
The caller never needs to validate the return — the tuple is always well-formed.
self.readentry(f) — static method that decodes one (key, value) record from a file handle using struct.unpack with big-endian 4-byte length prefixes.struct (for the entry format), os/file I/O.scan-range-sorted-order-assumption — scanrangeforkey assumes entries between start and end are sorted by key ascending; it breaks early on k > key, so unsorted data causes silent false negativesscan-range-tombstone-transparent — scanrangeforkey returns tombstone values (b"") as found; it does not interpret them — tombstone filtering is the caller's responsibilityscan-range-no-state-mutation — scanrangeforkey is a pure read operation with no side effects on SSTable instance statescan-range-corrupt-data-silent — If readentry encounters truncated data mid-range, scanrangeforkey silently returns (False, None) rather than raising an error