Function: _insert in b-tree-storage-engine/btree.py

Date: 2026-05-28

Time: 19:13

BTree._insert — Recursive B-Tree Insertion with Split Propagation

Purpose

_insert is the recursive core of B-tree insertion. It walks down from the given node to the correct leaf, inserts the key-value pair, and handles node splits on the way back up via return values. The caller (put) uses the return value to decide whether to update metadata, grow the tree height, or do nothing.

This design separates the recursive tree-walking logic from the top-level bookkeeping (metadata updates, WAL commits, root splits), keeping the recursion focused on a single concern: "insert into this subtree and tell me what happened."

Contract

Preconditions:

Postconditions:

Invariants maintained:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| page_num | int | Page number of the current node in the data file |

| key | str | The key to insert. Compared lexicographically via bisectleft/bisectright |

| value | bytes | The value to store. Opaque to the tree; serialized as-is |

| depth | int | Height of the subtree rooted at this node. 1 means leaf; >1 means internal. Decremented on each recursive call |

Return Value

The return type is a three-way discriminated union — the caller must check the type to determine what happened:

| Return | Meaning | Caller obligation |

|--------|---------|-------------------|

| None | Key existed; value was updated in-place | No metadata change needed |

| 'inserted' | New key inserted; no split | Increment total_keys in metadata |

| (midkey, newpagenum) | Node split occurred | Insert midkey into the parent, pointing to newpagenum as the new right child. If this is the root, create a new root |

Algorithm

Leaf case (depth == 1):

1. Read and deserialize the page into keys, values, and next_sib (sibling pointer).

2. Binary search (bisect_left) to find the insertion point idx.

3. Key exists (keys[idx] == key): overwrite values[idx], rewrite the page, return None.

4. Key is new: insert key and value at position idx.

5. No overflow (len(keys) <= max_keys): serialize and write the page, return 'inserted'.

6. Overflow — split:

Internal case (depth > 1):

1. Deserialize the internal node into ikeys and children.

2. Use bisectright to find which child subtree the key belongs in. (bisectright ensures equal keys go to the right child, which is the standard B-tree convention for internal routing.)

3. Recurse into children[idx] with depth - 1.

4. No split from child (None or 'inserted'): propagate the result unchanged.

5. Child split: the child returned (midkey, newchild).

6. No overflow: serialize and write the updated internal node, return 'inserted'.

7. Overflow — split internal:

The difference between leaf and internal splits matters: leaves copy mid_key (it stays in the right leaf for lookups), while internal nodes promote it (it's removed from both children).

Side Effects

Error Handling

_insert does not catch any exceptions. Possible failures:

There's no rollback on partial failure. If the process crashes mid-insert (after writing some pages but before WAL commit), recovery replays the WAL, which re-applies all logged writes. But if allocatepage updated metadata before the WAL was committed, the metadata page could be inconsistent — this is a known gap in the WAL coverage within insert.

Usage Patterns

_insert is called exactly once, from put:


result = self._insert(root, key, value, height)

The caller then dispatches on the return value:

Callers must never call _insert without subsequently committing the WAL — leaving uncommitted WAL entries would cause them to be replayed on next startup.

Dependencies

Beliefs