Date: 2026-05-29
Time: 07:45
deserializeleaf — Leaf Page Deserializerdeserializeleaf parses a raw byte buffer (a fixed-size disk page) back into the logical components of a B-tree leaf node: the sorted keys, their associated values, and the page number of the next sibling leaf. It's the inverse of serializeleaf and is the only way the B-tree reads leaf data after loading a page from disk.
Preconditions:
data is a bytes object at least HEADER_SIZE + 4 bytes long (3-byte header + 4-byte sibling pointer).data encodes page type LEAF (1). The function reads but discards this byte — it trusts the caller to have already verified the page type via pagetype().num_keys field in the header accurately reflects how many key-value entries follow.Postconditions:
(keys, values, nextsibling) where len(keys) == len(values) == numkeys.keys are decoded Python str objects; values remain as bytes.nextsibling is an unsigned 32-bit page number, or 0xFFFFFFFF (NOSIBLING) if this is the rightmost leaf.Invariant: The returned keys are in sorted order — but only because serializeleaf and the insert logic maintain that invariant. This function does not verify sort order.
| Parameter | Type | Description |
|-----------|------|-------------|
| data | bytes | A full page buffer, typically pagesize bytes (default 4096), read via PageManager.readpage() |
No edge-case handling for truncated or corrupt data — if the buffer is too short, struct.unpack will raise struct.error.
(keys: list[str], values: list[bytes], next_sibling: int)
keys — Sorted list of UTF-8 string keys.values — Corresponding raw byte values, positionally aligned with keys.nextsibling — Page number of the next leaf to the right, forming a singly-linked list across all leaves. NOSIBLING (0xFFFFFFFF) marks the end.The caller must handle the empty case (keys == []) — this occurs for a freshly created or fully-deleted leaf.
1. Unpack the 3-byte page header (HEADER_FMT = '>BH'):
- Byte 0: page type (discarded into `_`)
- Bytes 1-2: num_keys (big-endian unsigned short)
2. Read the next 4 bytes as the next-sibling pointer (big-endian unsigned int).
3. For each of the `num_keys` entries, sequentially:
a. Read 2 bytes → key length (klen)
b. Read klen bytes → key (decode as UTF-8)
c. Read 2 bytes → value length (vlen)
d. Read vlen bytes → value (kept as raw bytes)
e. Append to keys[] and values[]
4. Return (keys, values, next_sib)
The wire format per entry is: [klen:2B][key:klenB][vlen:2B][value:vlenB] — a classic length-prefixed TLV without the type tag. Maximum key or value size is 65,535 bytes (2^16 - 1), though in practice the page size (default 4096) constrains this much further.
None. Pure function — no I/O, no mutation, no state changes. It operates entirely on the byte buffer passed in.
No explicit error handling. The function will raise:
struct.error — if data is too short for any struct.unpack call (truncated page).UnicodeDecodeError — if a key contains invalid UTF-8 bytes.IndexError — if data is shorter than the offsets implied by the length prefixes (corrupt length fields).All of these propagate uncaught. The function assumes well-formed pages — reasonable given the WAL provides crash safety and only serializeleaf produces the data.
Every B-tree operation that touches leaf data calls this function:
_search (via get) — find a key by binary search on the deserialized keys_insert (via put) — deserialize, insert into the lists, re-serialize_delete — deserialize, remove from the lists, re-serializerange_scan — walk the sibling chain, deserializing each leaf_iter_ — same leaf-chain walkminkey / maxkey — deserialize to read the first/last keyThe typical call pattern is:
data = self.pm.read_page(page_num)
keys, values, next_sib = _deserialize_leaf(data)
Callers always pair this with PageManager.readpage(). After mutation, they re-serialize with serialize_leaf and write back through the WAL.
struct (stdlib) — all binary unpackingHEADERFMT ('>BH'), HEADERSIZE (3 bytes). These are shared with serializeleaf, deserializeinternal, and serializeinternal, forming the common page-header contract.No external dependencies.
leaf-wire-format-is-header-sibling-then-kv-pairs — Leaf page binary layout is: 3-byte header (type + num_keys), 4-byte next-sibling pointer, then N length-prefixed key-value pairs, each with 2-byte key length, key bytes, 2-byte value length, value bytes.leaf-keys-are-utf8-values-are-raw-bytes — Deserialization decodes keys as UTF-8 strings but leaves values as raw bytes; this asymmetry means keys must be valid UTF-8 while values are opaque.max-key-or-value-size-is-65535-bytes — Key and value lengths are stored as >H (unsigned 16-bit), capping individual key or value size at 65,535 bytes, though page size is the practical limit.deserialize-leaf-is-pure-and-trusts-input — The function performs no validation of page type, sort order, or buffer length; it assumes the WAL and serialize functions guarantee well-formed pages.sibling-pointer-enables-sequential-scan-without-parent-traversal — The nextsibling field in each leaf forms a singly-linked list across all leaves, allowing rangescan and _iter_ to walk leaves left-to-right without re-descending from the root.