Function: _search in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 10:35

BTree._search — Point lookup traversal from root to leaf

Purpose

_search is the recursive core of B-tree point lookups. It traverses from an internal node down to the leaf level, reading one page per level, and returns the value associated with a key or None if the key doesn't exist. It exists to separate the traversal logic from the public get() method, which handles metadata reads and counter resets.

Contract

Preconditions:

Postconditions:

Invariant assumed: The tree is balanced. depth is decremented by exactly 1 at each recursive call, and depth == 1 always means "this page is a leaf." If the metadata height is wrong (e.g., after a corrupted write), the method will deserialize a page using the wrong format and produce garbage or crash.

Parameters

| Parameter | Type | Meaning |

|-----------|------|---------|

| pagenum | int | Page number to read from the data file. Multiplied by pagesize to get the byte offset. |

| key | str | The lookup key. Must be a string — bisectleft/bisectright compare it against deserialized string keys. |

| depth | int | Levels remaining to the leaf. 1 = this page is a leaf; >1 = internal node. Caller sets this to the tree's height from metadata. |

Edge cases: If depth is 0 or negative, the method falls through to the internal-node branch and tries to deserialize whatever is at page_num as an internal node — no guard exists. This can't happen in normal operation because get() reads the height from metadata (minimum 1).

Return Value

The caller (get()) passes this through directly, so the public API also returns bytes | None.

Algorithm

1. Read the pageself.pm.readpage(pagenum) fetches pagesize bytes from offset pagenum * pagesize in the data file. This increments pagesread.

2. Leaf case (depth == 1):

3. Internal case (depth > 1):

Total I/O: Exactly height page reads — one per tree level. For a tree with N keys and branching factor B, that's O(log_B(N)).

Side Effects

Error Handling

None explicit. Possible failures:

All errors are uncaught and bubble to the caller. This is appropriate — there's no meaningful recovery from a corrupted page mid-search.

Usage Patterns

Only called from get():


def get(self, key):
    self.pm.reset_counters()
    root, height, _, _, _ = self._read_meta()
    return self._search(root, key, height)

The caller reads metadata to obtain the root page and height, resets I/O counters, then delegates. No other method calls searchfindleaf is a separate traversal used by rangescan that only locates the leaf page number without checking for a specific key.

Dependencies

| Dependency | Role |

|------------|------|

| self.pm.read_page() | Page-level I/O from PageManager |

| deserializeleaf() | Unpacks leaf binary format → (keys, values, next_sibling) |

| deserializeinternal() | Unpacks internal binary format → (keys, children) |

| bisectleft / bisectright | stdlib binary search from bisect module |

Key Design Choice: bisectleft vs bisectright

The method uses bisectleft for leaves and bisectright for internal nodes. This is intentional and matches the split strategy in insert: when a leaf splits, the median key is *copied up* to the parent and also stays in the right leaf as its first key. So bisectright at internal nodes routes an exact-match key to the right child (where that key lives), and bisect_left at the leaf finds the exact position.