Function: PageManager in b-tree-storage-engine/btree.py

Date: 2026-05-29

Time: 11:02

PageManager — Fixed-Size Page I/O Layer

Purpose

PageManager is the lowest-level storage abstraction in this B-tree implementation. It treats a single file as an array of fixed-size pages (default 4096 bytes — one OS page / one disk sector on most systems) and provides read/write access by page number. It also manages page allocation and a free list so that deleted pages can be reclaimed without compacting the file.

This exists to separate the concerns of *how bytes get to disk* from *what the B-tree structure looks like*. The BTree class works entirely in terms of page numbers and serialized byte buffers; PageManager handles the file seeks, padding, and space management.

Contract

Preconditions:

Postconditions:

Invariants:

Parameters

_init(self, filepath, page_size=4096)

| Parameter | Type | Description |

|-----------|------|-------------|

| file_path | str | Path to the data file. Created if absent. |

| page_size | int | Bytes per page. Must be consistent across the file's lifetime — there is no on-disk record of page size, so reopening with a different value silently corrupts reads. |

Return Values

Algorithm

Initialization (_init_):

1. Open the file in r+b (existing) or w+b (new).

2. If new: write metadata at page 0 with root=1, height=1, totalkeys=0, nextfree=2, freehead=NOSIBLING. Then write an empty leaf at page 1. The tree starts as a single empty leaf.

Page allocation (allocate_page):

1. Read current metadata.

2. If freehead != NOSIBLING — a previously freed page exists:

3. Otherwise, bump next_free by 1 and return the old value. This extends the file on the next write.

Page freeing (free_page):

1. Read current metadata to get the current free_head.

2. Overwrite the freed page with a minimal header (type=0, numkeys=0) followed by a 4-byte pointer to the old freehead. This makes the freed page a node in the free list.

3. Update metadata so free_head points to this newly freed page.

This is a classic intrusive singly-linked free list — the free list nodes are stored *inside* the freed pages themselves, requiring no additional bookkeeping space.

Side Effects

Error Handling

Usage Patterns

PageManager is always used through BTree, never directly by application code:


# BTree.__init__ creates and owns the PageManager
self.pm = PageManager(data_path, page_size)

# Typical lifecycle inside BTree:
data = self.pm.read_page(page_num)       # read
self.pm.write_page(page_num, new_data)   # write
new_page = self.pm.allocate_page()       # grow
self.pm.free_page(old_page)              # reclaim
self.pm.sync()                           # durability barrier
self.pm.close()                          # shutdown

Caller obligations:

Dependencies

No external dependencies. Pure stdlib.

Assumptions Not Enforced

1. Page size consistency: Nothing on disk records what page_size was used. Reopening with a different value silently produces corrupt reads.

2. Single-writer: No file locking. Two processes opening the same file will corrupt it.

3. flush() ≠ durability: Most writes call flush() but not fsync(). On crash, recently written pages may be lost. The BTree class compensates via the WAL, but PageManager alone does not guarantee durability.

4. Page number validity: readpage and writepage accept any int — there's no bounds check against next_free. Reading an unallocated page returns zeros; writing one silently extends the file.

5. Free list integrity: If the file is corrupted such that a free-list pointer forms a cycle, allocate_page will loop forever, reusing the same pages.

Beliefs