Date: 2026-05-29
Time: 10:50
When a page allocator needs to track which pages are free, it has two choices: maintain a separate data structure (an array, bitmap, or external list), or store the bookkeeping *inside the free pages themselves*. This codebase uses the second approach — an intrusive free list — where each freed page's own bytes hold the pointer to the next free page.
This works because a free page isn't being used for anything. Its 4096 bytes are wasted space. So the allocator repurposes a few of those bytes to thread a singly-linked list through the free pages.
The implementation lives in PageManager in b-tree-storage-engine/btree.py.
Line 16 defines the metadata layout:
# root_page(4B), height(4B), total_keys(4B), next_free_page(4B), free_list_head(4B)
The fifth field, freelisthead, is the page number of the first free page. It's the entry point into the linked list. If no pages are free, it holds the sentinel NO_SIBLING (0xFFFFFFFF, line 12).
def free_page(self, page_num):
root, height, total_keys, next_free, free_head = self.read_meta()
# Write a free-list node: header + pointer to old head
data = struct.pack(HEADER_FMT, 0, 0) + struct.pack('>I', free_head)
self.write_page(page_num, data)
self.write_meta(root, height, total_keys, next_free, page_num)
This is a classic stack push. It:
1. Reads the current head of the free list (free_head)
2. Overwrites the freed page's contents with a minimal header (type=0, num_keys=0) followed by a 4-byte pointer to the old head — this is the metadata stored inside the free block
3. Updates the metadata page so freelisthead now points to the newly freed page
After this, the freed page's bytes look like:
[1B type=0] [2B num_keys=0] [4B next_free_pointer] [... rest is padding ...]
The key insight: the nextfreepointer lives at bytes 3–6 of the freed page itself. No external table needed.
def allocate_page(self):
root, height, total_keys, next_free, free_head = self.read_meta()
if free_head != NO_SIBLING:
page_num = free_head
page_data = self.read_page(page_num)
new_head = struct.unpack('>I', page_data[HEADER_SIZE:HEADER_SIZE+4])[0]
self.write_meta(root, height, total_keys, next_free, new_head)
return page_num
page_num = next_free
self.write_meta(root, height, total_keys, page_num + 1, free_head)
return page_num
This is a stack pop. It:
1. Checks if the free list is non-empty (freehead != NOSIBLING)
2. Reads the free page to extract the next pointer from pagedata[HEADERSIZE:HEADER_SIZE+4] — pulling the allocator metadata back out of the block
3. Updates the metadata page so freelisthead advances to the next free page
4. Returns the reclaimed page number
If the list is empty, it falls back to bumping next_free — extending the file.
Zero overhead for free tracking. The free list consumes no additional pages or memory. Each free page is its own list node. If you have 1000 free pages, you have a 1000-node linked list that occupies exactly those 1000 pages — no extra storage.
O(1) allocate and free. Both operations are stack push/pop: read the head, follow one pointer, update the head. Compare this to scanning a bitmap, which is O(n) in the number of pages.
The tradeoff is fragmentation. A free list has no notion of contiguity. If you free pages 5, 10, and 15, subsequent allocations return them in LIFO order (15, 10, 5) regardless of physical layout. Bitmap-based allocators can search for contiguous runs, which matters for sequential I/O. This implementation accepts that tradeoff — B-tree access patterns are already random I/O by nature.
This is the same technique used by malloc implementations (glibc's ptmalloc stores forward/backward pointers in freed chunks), Linux kernel page allocators (buddy system free lists), and production databases like PostgreSQL (which threads free pages into a free space map). The principle is universal: if a block isn't holding user data, its bytes are available for the allocator's own bookkeeping.
b-tree-storage-engine/btree.py:allocate_page — Trace the two allocation paths (free list reuse vs. file extension) and when each triggersfree-list-fragmentation — How LIFO reuse ordering affects page locality and what strategies (e.g., address-ordered free lists, buddy systems) mitigate itcrash-safety-of-free-list — The current free_page writes the page then updates metadata in two non-atomic steps — explore what happens if a crash occurs between themb-tree-storage-engine/btree.py:freepage — Called from delete_entry (line 472) during B-tree key deletion; follow the deletion path to see when pages become reclaimablebitmap-vs-freelist-allocators — Compare this linked-list approach to bitmap-based allocators (e.g., ext4 block groups) for understanding the contiguity vs. simplicity tradeofffree-list-is-intrusive-singly-linked — The free page list is a singly-linked list where each free page stores the next-free pointer at bytes HEADERSIZE to HEADERSIZE+4 within the page itselfallocate-page-prefers-reuse — allocate_page checks the free list before extending the file, so freed pages are always reused before new pages are appendedfree-list-is-lifo — freepage pushes onto the head and allocatepage pops from the head, making reuse order LIFO (most recently freed page is allocated first)free-list-head-persisted-in-page-zero — The free list head pointer is stored as the fifth field of the metadata page (page 0), making it durable across restartsfree-page-not-atomic-with-meta-update — free_page performs two separate writes (page content then metadata) without WAL protection, creating a window where a crash could leak a page