Function: serializeinternal in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 07:46

serializeinternal — Serialize a B-Tree Internal Node to Binary

Purpose

serializeinternal converts an in-memory representation of a B-tree internal (non-leaf) node into a compact binary byte sequence suitable for writing to a fixed-size page on disk. Internal nodes don't store values — they store keys that act as separators and pointers (page numbers) to child pages. This function is the counterpart to deserializeinternal, and together they form the serialization boundary between the in-memory tree operations and the page-based storage layer.

Contract

Preconditions:

Postconditions:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| keys | list[str \| bytes] | Separator keys in sorted order. Determines which child subtree a search descends into. |

| children | list[int] | Page numbers of child nodes. children[i] contains all keys < keys[i]; children[i+1] contains all keys >= keys[i]. |

Edge cases:

Return Value

bytes — the serialized page data, not yet padded to page size. The caller (walwritepagePageManager.writepage) pads with null bytes via .ljust(page_size, b'\x00').

Algorithm


1. Pack a 3-byte header: page type (INTERNAL = 0) as uint8, key count as uint16 big-endian.
2. Pack children[0] as a 4-byte big-endian uint32 — the leftmost child pointer.
3. For each key at index i:
   a. Encode the key to UTF-8 bytes if it's a str (no-op if already bytes).
   b. Pack a 2-byte key length prefix (uint16), then the raw key bytes.
   c. Pack children[i+1] as a 4-byte uint32 — the child pointer to the right of this key.
4. Return the accumulated buffer.

The binary layout is:


[type:1B][num_keys:2B][child_0:4B] ([key_len:2B][key_data:varB][child_i+1:4B])...

This interleaved layout mirrors the logical structure: start with the leftmost child, then alternate (separator key, right child). It's the standard on-disk format for B-tree internal nodes described in DDIA Chapter 3.

Side Effects

None. This is a pure function — no I/O, no mutations, no state changes.

Error Handling

No exceptions are intentionally raised or caught. The function trusts its callers to uphold the B-tree structural invariant.

Usage Patterns

Called in three places within BTree:

1. put → root split (btree.py:213): creates a new root internal node with one key and two children (the old root and the new page from the split).

2. _insert → internal node update after child split (btree.py:249): re-serializes an internal node after inserting a promoted key and new child pointer.

3. _delete → internal node update after child removal (btree.py:286): re-serializes after removing a key and child pointer when an empty leaf is cleaned up.

The caller obligation is to ensure the key/children invariant holds and that the resulting byte sequence doesn't exceed page_size.

Dependencies

| Dependency | Usage |

|------------|-------|

| struct (stdlib) | Binary packing via struct.pack with big-endian format strings |

| HEADER_FMT ('>BH') | 3-byte header format: 1 byte type + 2 byte key count |

| INTERNAL (0) | Page type constant written into the header |

No external dependencies. The function uses only Python stdlib and module-level constants.

Design Notes

The key length is stored as uint16 ('>H'), capping individual keys at 65535 bytes. The key count in the header is also uint16, capping an internal node at 65535 keys — far beyond what would fit in a typical 4KB page. The actual limit is enforced by self.maxkeys in BTree.insert.

The function does not include any checksum or page-level integrity check — that responsibility falls to the WAL layer, which wraps each page write with a CRC32.

Topics to Explore

Beliefs