Function: put in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 07:37

BTree.put(key, value) — Insert or Update a Key-Value Pair

Purpose

put is the primary write path for the B-tree storage engine. It inserts a new key-value pair or updates an existing one, handling three distinct outcomes: in-place updates, simple insertions, and root splits that grow the tree height. All writes go through the WAL (write-ahead log) for crash safety — if the process dies mid-operation, the WAL can replay or discard the partial write on recovery.

Contract

Preconditions:

Postconditions:

Invariants preserved:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| key | str | The lookup key. Encoded to UTF-8 for storage. No length limit enforced independently — the page-size check catches oversized keys. |

| value | bytes | The payload. Stored as raw bytes. Treated as opaque by the tree. |

Return Value

Returns None in all cases. This is a fire-and-forget write — success is implicit (no exception = committed). There is no way for the caller to distinguish "inserted new" from "updated existing" without a prior get.

Algorithm

Step 1 — Size validation. Encodes the key and computes the worst-case entry size: page header (3B) + next-sibling pointer (4B) + key-length prefix (2B) + key bytes + value-length prefix (2B) + value bytes. If this exceeds page_size, the entry can never fit in any leaf, so it raises immediately. Note: this is a conservative lower bound — it checks whether a leaf with *only this one entry* would fit, not whether it fits alongside existing entries.

Step 2 — Read metadata. Fetches the root page number, tree height, total key count, next-free-page pointer, and free-list head. Resets I/O counters first for per-operation stats tracking.

Step 3 — Recursive insert. Calls insert(root, key, value, height), which traverses from root to the target leaf, performing the insert and handling splits bottom-up. insert returns one of three values:

Step 4 — Handle each case:

Step 5 — WAL commit. In all three paths, self.wal.commit(self.pm) syncs the data file (fsync) then truncates the WAL to zero. This is the atomic durability boundary — once the WAL is truncated, the writes are committed.

Side Effects

Error Handling

Usage Patterns


tree = BTree('/tmp/mydb')
tree.put('user:1001', b'{"name": "Alice"}')
tree.put('user:1001', b'{"name": "Alice", "active": true}')  # update
tree.close()

Callers are expected to:

Dependencies

Notable Assumptions

1. Single-writer assumption. The metadata re-read pattern (readmeta() after _insert) assumes no concurrent writer has modified it. There are no locks or CAS operations.

2. value is bytes. Not enforced by a type check — passing a string silently produces incorrect behavior or a crash in serialization.

3. Page size is sufficient for metadata + at least one entry. The size check validates the entry alone, but doesn't account for the case where max_keys might be computed to a value that can't actually hold that many entries of this size.

4. allocatepage mutates metadata as a side effect. This is why put re-reads metadata after insert returns — the nextfree and freehead fields may have changed during splits.

Beliefs