Original source: b-tree-storage-engine-btree-_search

Function: _search in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 10:35

`BTree._search` — Point lookup traversal from root to leaf

Purpose

_search is the recursive core of B-tree point lookups. It traverses from an internal node down to the leaf level, reading one page per level, and returns the value associated with a key or None if the key doesn't exist. It exists to separate the traversal logic from the public get() method, which handles metadata reads and counter resets.

Contract

Preconditions:

page_num refers to a valid, allocated page containing either a serialized leaf (when depth == 1) or a serialized internal node (when depth > 1)
depth accurately reflects the distance from this node to the leaf level — the tree is perfectly balanced, so every path from root to leaf has the same depth
key is a string (the serialization functions decode keys as UTF-8 strings)

Postconditions:

Returns bytes (the stored value) if the key exists, None otherwise
The tree is unmodified — this is a read-only operation

Invariant assumed: The tree is balanced. depth is decremented by exactly 1 at each recursive call, and depth == 1 always means "this page is a leaf." If the metadata height is wrong (e.g., after a corrupted write), the method will deserialize a page using the wrong format and produce garbage or crash.

Parameters

| Parameter | Type | Meaning |

|-----------|------|---------|

| pagenum | int | Page number to read from the data file. Multiplied by pagesize to get the byte offset. |

| key | str | The lookup key. Must be a string — bisectleft/bisectright compare it against deserialized string keys. |

| depth | int | Levels remaining to the leaf. 1 = this page is a leaf; >1 = internal node. Caller sets this to the tree's height from metadata. |

Edge cases: If depth is 0 or negative, the method falls through to the internal-node branch and tries to deserialize whatever is at page_num as an internal node — no guard exists. This can't happen in normal operation because get() reads the height from metadata (minimum 1).

Return Value

bytes — the value associated with the key, if found in the leaf
None — the key does not exist in the subtree

The caller (get()) passes this through directly, so the public API also returns bytes | None.

Algorithm

1. Read the page — self.pm.readpage(pagenum) fetches pagesize bytes from offset pagenum * pagesize in the data file. This increments pagesread.

2. Leaf case (depth == 1):

Deserialize the page as a leaf, extracting sorted keys, values, and a nextsibling pointer (discarded here — only rangescan follows sibling chains).
Use bisectleft(keys, key) to find the insertion point. bisectleft is correct here because we want the *leftmost* position where key could appear — if keys[idx] == key, that's our match.
Check idx < len(keys) and keys[idx] == key. Both conditions are necessary: bisect_left can return len(keys) if key is larger than all stored keys.
Return the value at that index, or None.

3. Internal case (depth > 1):

Deserialize as an internal node, extracting separator ikeys and children pointers.
Use bisectright(ikeys, key) to pick the child. bisectright is the correct choice: for an internal node with keys [k0, k1, ...] and children [c0, c1, c2, ...], child ci covers keys in the range [k{i-1}, ki). bisectright ensures that a key equal to a separator goes to the *right* child, matching the split logic in _insert where the right page keeps the median key.
Recurse into children[idx] with depth - 1.

Total I/O: Exactly height page reads — one per tree level. For a tree with N keys and branching factor B, that's O(log_B(N)).

Side Effects

Increments self.pm.pagesread by 1 for each level traversed. The caller get() calls resetcounters() before the search, so after get() returns, pm.pages_read equals the tree height.
File I/O: Reads from the data file via self.pm.f.seek() and self.pm.f.read(). No writes.

Error Handling

None explicit. Possible failures:

struct.unpack errors if the page data is corrupted or truncated — these propagate as struct.error
IndexError if children[idx] is out of bounds — would indicate a corrupted internal node where the children array doesn't have len(keys) + 1 entries
I/O errors from read_page — propagate as OSError

All errors are uncaught and bubble to the caller. This is appropriate — there's no meaningful recovery from a corrupted page mid-search.

Usage Patterns

Only called from get():


def get(self, key):
    self.pm.reset_counters()
    root, height, _, _, _ = self._read_meta()
    return self._search(root, key, height)

The caller reads metadata to obtain the root page and height, resets I/O counters, then delegates. No other method calls search — findleaf is a separate traversal used by rangescan that only locates the leaf page number without checking for a specific key.

Dependencies

| Dependency | Role |

|------------|------|

| self.pm.read_page() | Page-level I/O from PageManager |

| deserializeleaf() | Unpacks leaf binary format → (keys, values, next_sibling) |

| deserializeinternal() | Unpacks internal binary format → (keys, children) |

| bisectleft / bisectright | stdlib binary search from bisect module |

Key Design Choice: `bisectleft` vs `bisect``right`

The method uses bisectleft for leaves and bisectright for internal nodes. This is intentional and matches the split strategy in insert: when a leaf splits, the median key is *copied up* to the parent and also stays in the right leaf as its first key. So bisectright at internal nodes routes an exact-match key to the right child (where that key lives), and bisect_left at the leaf finds the exact position.