Function: deserializeleaf in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 07:45

deserializeleaf — Leaf Page Deserializer

Purpose

deserializeleaf parses a raw byte buffer (a fixed-size disk page) back into the logical components of a B-tree leaf node: the sorted keys, their associated values, and the page number of the next sibling leaf. It's the inverse of serializeleaf and is the only way the B-tree reads leaf data after loading a page from disk.

Contract

Preconditions:

Postconditions:

Invariant: The returned keys are in sorted order — but only because serializeleaf and the insert logic maintain that invariant. This function does not verify sort order.

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| data | bytes | A full page buffer, typically pagesize bytes (default 4096), read via PageManager.readpage() |

No edge-case handling for truncated or corrupt data — if the buffer is too short, struct.unpack will raise struct.error.

Return Value


(keys: list[str], values: list[bytes], next_sibling: int)

The caller must handle the empty case (keys == []) — this occurs for a freshly created or fully-deleted leaf.

Algorithm


1. Unpack the 3-byte page header (HEADER_FMT = '>BH'):
   - Byte 0: page type (discarded into `_`)
   - Bytes 1-2: num_keys (big-endian unsigned short)

2. Read the next 4 bytes as the next-sibling pointer (big-endian unsigned int).

3. For each of the `num_keys` entries, sequentially:
   a. Read 2 bytes → key length (klen)
   b. Read klen bytes → key (decode as UTF-8)
   c. Read 2 bytes → value length (vlen)
   d. Read vlen bytes → value (kept as raw bytes)
   e. Append to keys[] and values[]

4. Return (keys, values, next_sib)

The wire format per entry is: [klen:2B][key:klenB][vlen:2B][value:vlenB] — a classic length-prefixed TLV without the type tag. Maximum key or value size is 65,535 bytes (2^16 - 1), though in practice the page size (default 4096) constrains this much further.

Side Effects

None. Pure function — no I/O, no mutation, no state changes. It operates entirely on the byte buffer passed in.

Error Handling

No explicit error handling. The function will raise:

All of these propagate uncaught. The function assumes well-formed pages — reasonable given the WAL provides crash safety and only serializeleaf produces the data.

Usage Patterns

Every B-tree operation that touches leaf data calls this function:

The typical call pattern is:


data = self.pm.read_page(page_num)
keys, values, next_sib = _deserialize_leaf(data)

Callers always pair this with PageManager.readpage(). After mutation, they re-serialize with serialize_leaf and write back through the WAL.

Dependencies

No external dependencies.

Beliefs