Date: 2026-05-29
Time: 10:35
BTree._search — Point lookup traversal from root to leaf_search is the recursive core of B-tree point lookups. It traverses from an internal node down to the leaf level, reading one page per level, and returns the value associated with a key or None if the key doesn't exist. It exists to separate the traversal logic from the public get() method, which handles metadata reads and counter resets.
Preconditions:
page_num refers to a valid, allocated page containing either a serialized leaf (when depth == 1) or a serialized internal node (when depth > 1)depth accurately reflects the distance from this node to the leaf level — the tree is perfectly balanced, so every path from root to leaf has the same depthkey is a string (the serialization functions decode keys as UTF-8 strings)Postconditions:
bytes (the stored value) if the key exists, None otherwiseInvariant assumed: The tree is balanced. depth is decremented by exactly 1 at each recursive call, and depth == 1 always means "this page is a leaf." If the metadata height is wrong (e.g., after a corrupted write), the method will deserialize a page using the wrong format and produce garbage or crash.
| Parameter | Type | Meaning |
|-----------|------|---------|
| pagenum | int | Page number to read from the data file. Multiplied by pagesize to get the byte offset. |
| key | str | The lookup key. Must be a string — bisectleft/bisectright compare it against deserialized string keys. |
| depth | int | Levels remaining to the leaf. 1 = this page is a leaf; >1 = internal node. Caller sets this to the tree's height from metadata. |
Edge cases: If depth is 0 or negative, the method falls through to the internal-node branch and tries to deserialize whatever is at page_num as an internal node — no guard exists. This can't happen in normal operation because get() reads the height from metadata (minimum 1).
bytes — the value associated with the key, if found in the leafNone — the key does not exist in the subtreeThe caller (get()) passes this through directly, so the public API also returns bytes | None.
1. Read the page — self.pm.readpage(pagenum) fetches pagesize bytes from offset pagenum * pagesize in the data file. This increments pagesread.
2. Leaf case (depth == 1):
keys, values, and a nextsibling pointer (discarded here — only rangescan follows sibling chains).bisectleft(keys, key) to find the insertion point. bisectleft is correct here because we want the *leftmost* position where key could appear — if keys[idx] == key, that's our match.idx < len(keys) and keys[idx] == key. Both conditions are necessary: bisect_left can return len(keys) if key is larger than all stored keys.None.3. Internal case (depth > 1):
ikeys and children pointers.bisectright(ikeys, key) to pick the child. bisectright is the correct choice: for an internal node with keys [k0, k1, ...] and children [c0, c1, c2, ...], child ci covers keys in the range [k{i-1}, ki). bisectright ensures that a key equal to a separator goes to the *right* child, matching the split logic in _insert where the right page keeps the median key.children[idx] with depth - 1.Total I/O: Exactly height page reads — one per tree level. For a tree with N keys and branching factor B, that's O(log_B(N)).
self.pm.pagesread by 1 for each level traversed. The caller get() calls resetcounters() before the search, so after get() returns, pm.pages_read equals the tree height.self.pm.f.seek() and self.pm.f.read(). No writes.None explicit. Possible failures:
struct.unpack errors if the page data is corrupted or truncated — these propagate as struct.errorIndexError if children[idx] is out of bounds — would indicate a corrupted internal node where the children array doesn't have len(keys) + 1 entriesread_page — propagate as OSErrorAll errors are uncaught and bubble to the caller. This is appropriate — there's no meaningful recovery from a corrupted page mid-search.
Only called from get():
def get(self, key):
self.pm.reset_counters()
root, height, _, _, _ = self._read_meta()
return self._search(root, key, height)
The caller reads metadata to obtain the root page and height, resets I/O counters, then delegates. No other method calls search — findleaf is a separate traversal used by rangescan that only locates the leaf page number without checking for a specific key.
| Dependency | Role |
|------------|------|
| self.pm.read_page() | Page-level I/O from PageManager |
| deserializeleaf() | Unpacks leaf binary format → (keys, values, next_sibling) |
| deserializeinternal() | Unpacks internal binary format → (keys, children) |
| bisectleft / bisectright | stdlib binary search from bisect module |
bisectleft vs bisectrightThe method uses bisectleft for leaves and bisectright for internal nodes. This is intentional and matches the split strategy in insert: when a leaf splits, the median key is *copied up* to the parent and also stays in the right leaf as its first key. So bisectright at internal nodes routes an exact-match key to the right child (where that key lives), and bisect_left at the leaf finds the exact position.