Function: mayberotate in hash-index-storage/bitcask.py

Date: 2026-05-29

Time: 11:43

mayberotate — Active File Rotation Guard

Purpose

mayberotate enforces the maximum file size invariant in a Bitcask storage engine. Bitcask writes are append-only to a single "active" data file. Without rotation, that file would grow without bound. This method checks whether the active file has reached maxfilesize and, if so, closes it and opens a fresh one with the next sequential ID.

It exists to keep individual data files bounded in size, which matters for two reasons: (1) compaction operates on immutable (non-active) files, so rotation is what creates compaction candidates, and (2) bounded file sizes keep mmap/read operations practical and make crash recovery faster since only the active file needs scanning.

Contract

Parameters

None — this is a zero-argument instance method. All state is read from self.

Return Value

None. This is a guard/side-effect method. The caller doesn't need to handle a return value — the rotation either happened or it didn't.

Algorithm

1. Check file position: self.active_file.tell() returns the current write cursor, which equals the file size since the file is opened in append mode.

2. Compare against threshold: If the position is >= self.maxfilesize, proceed with rotation.

3. Close the current file: self.activefile.close() flushes and closes the write handle. The read handle for this file ID remains open in self.filehandles — intentionally, since existing KeyEntry records still point to it.

4. Increment file ID: self.activefileid += 1 picks the next sequential ID. There's no gap detection — it assumes sequential IDs are safe.

5. Open new active file: openactivefile() creates a new .data file, opens it for appending, and registers a read handle in self.filehandles.

Side Effects

Error Handling

None. If openactivefile fails (e.g., permission error, disk full), the exception propagates uncaught. At that point, self.activefile has already been closed and activefileid incremented — leaving the store in an inconsistent state. This is a deliberate simplicity tradeoff in a reference implementation: production Bitcask engines (like Riak's) handle this more carefully.

Usage Patterns

Called as a guard at the top of every mutating operation:


def put(self, key, value):
    self._maybe_rotate()        # ensure room before writing
    offset, size, ts = self._write_record(key, value)
    ...

def delete(self, key):
    self._maybe_rotate()        # tombstones are records too
    self._write_record(key, "")
    ...

The caller obligation is simple: call mayberotate() before writerecord(). This means a record can still push the file slightly past maxfilesize (the check is pre-write, not mid-write), but the next operation will trigger rotation. The file size limit is therefore a soft cap, not a hard one — any individual record can exceed the boundary by up to one record's worth of bytes.

Dependencies

Assumptions Not Enforced by Types

1. tell() accuracy on append files: The code assumes tell() on an "ab" file returns the file's total size. This relies on the seek(0, 2) call in openactive_file and on the OS not buffering in a way that makes tell() lag behind actual writes. This works on POSIX but is technically implementation-dependent.

2. No concurrent writers: Nothing prevents two threads from both passing the size check and writing simultaneously. Bitcask's original design assumes a single writer process.

3. Sequential ID availability: The method blindly increments activefileid without checking whether a file with that ID already exists on disk. After compaction (which renumbers files), this could collide — though compact() updates activefileid to avoid this.