Function: serializeleaf in b-tree-storage-engine/btree.py

Date: 2026-05-28

Time: 19:15

serializeleaf — Leaf Page Serializer

Purpose

serializeleaf converts an in-memory representation of a B-tree leaf node (lists of keys and values) into a compact binary format suitable for writing to a fixed-size disk page. It's the write half of the leaf serialization pair — deserializeleaf is its inverse.

This exists because the B-tree operates on in-memory lists during mutations (insert, delete, update) but persists data as raw bytes in a page-oriented file. Every time a leaf changes, the engine rebuilds the binary page from scratch rather than doing in-place byte surgery.

Contract

Preconditions:

Postconditions:

Invariant: The order of entries in the output matches the order of keys/values — the function does not sort.

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| keys | list[str \| bytes] | Sorted key sequence. Strings are UTF-8 encoded; bytes pass through as-is. |

| values | list[bytes] | Corresponding values. Must be raw bytes — no encoding is applied. |

| next_sibling | int | Page number of the next leaf in the linked list. Defaults to 0xFFFFFFFF (sentinel meaning "no next leaf"). |

Edge cases: Empty keys/values is valid — produces a 7-byte buffer (header + sibling pointer) with zero entries. This is used during tree initialization (writeempty_leaf).

Return Value

A bytes object — the raw page content (unpadded). The caller (walwritepagePageManager.writepage) pads it to page_size with null bytes before writing to disk.

Algorithm

1. Pack the header (3 bytes): page type LEAF (1 byte, value 0x01) + number of keys as big-endian unsigned 16-bit (>BH).

2. Pack the sibling pointer (4 bytes): big-endian unsigned 32-bit (>I).

3. For each key-value pair:

4. Return the accumulated buffer.

The wire format per entry is: [keylen: 2B][key: variable][vallen: 2B][val: variable] — a standard length-prefixed TLV layout without the type.

Side Effects

None. Pure function — no I/O, no mutation, no state changes.

Error Handling

No explicit error handling. Will raise implicitly if:

Usage Patterns

Called in three contexts:

1. Leaf insert/update (_insert, depth==1): after modifying the in-memory key/value lists, reserializes the whole leaf.

2. Leaf split (insert): serializes both left and right halves after splitting, with the right page inheriting the old nextsibling and the left page pointing to the new right page.

3. Leaf delete (_delete): reserializes after removing an entry; also used when relinking sibling pointers during empty-leaf cleanup.

Callers always pass the result to walwrite_page, never directly to PageManager.

Dependencies

No external dependencies.

Assumptions Not Enforced

1. len(keys) == len(values)zip silently drops extras, corrupting the num_keys count in the header.

2. Total serialized size ≤ pagesize — overflow is truncated silently by PageManager.writepage, corrupting the page.

3. Values are bytes — no type check or encoding; passing a str value crashes at concatenation.

4. Key/value length < 2^16struct.pack('>H', ...) will raise, but no friendly error message.

5. Keys are already sorted — the function preserves order but doesn't verify it; an unsorted input produces a valid-looking but logically broken leaf (binary search in _search would fail).

Beliefs