Date: 2026-05-29
Time: 11:43
mayberotate — Active File Rotation Guardmayberotate enforces the maximum file size invariant in a Bitcask storage engine. Bitcask writes are append-only to a single "active" data file. Without rotation, that file would grow without bound. This method checks whether the active file has reached maxfilesize and, if so, closes it and opens a fresh one with the next sequential ID.
It exists to keep individual data files bounded in size, which matters for two reasons: (1) compaction operates on immutable (non-active) files, so rotation is what creates compaction candidates, and (2) bounded file sizes keep mmap/read operations practical and make crash recovery faster since only the active file needs scanning.
self.activefile is an open file handle in append-binary mode ("ab"), and self.activefileid is its corresponding ID. self.maxfile_size is a positive integer.>= maxfilesize, the old file is closed, activefileid is incremented by 1, and a new active file is opened via openactive_file(). The old file becomes immutable (no further writes). If the size was below the threshold, nothing changes.None — this is a zero-argument instance method. All state is read from self.
None. This is a guard/side-effect method. The caller doesn't need to handle a return value — the rotation either happened or it didn't.
1. Check file position: self.active_file.tell() returns the current write cursor, which equals the file size since the file is opened in append mode.
2. Compare against threshold: If the position is >= self.maxfilesize, proceed with rotation.
3. Close the current file: self.activefile.close() flushes and closes the write handle. The read handle for this file ID remains open in self.filehandles — intentionally, since existing KeyEntry records still point to it.
4. Increment file ID: self.activefileid += 1 picks the next sequential ID. There's no gap detection — it assumes sequential IDs are safe.
5. Open new active file: openactivefile() creates a new .data file, opens it for appending, and registers a read handle in self.filehandles.
"ab" for writing, one "rb" for reading via openactive_file).self.activefile, self.activefileid, and self.filehandles..data file on disk.None. If openactivefile fails (e.g., permission error, disk full), the exception propagates uncaught. At that point, self.activefile has already been closed and activefileid incremented — leaving the store in an inconsistent state. This is a deliberate simplicity tradeoff in a reference implementation: production Bitcask engines (like Riak's) handle this more carefully.
Called as a guard at the top of every mutating operation:
def put(self, key, value):
self._maybe_rotate() # ensure room before writing
offset, size, ts = self._write_record(key, value)
...
def delete(self, key):
self._maybe_rotate() # tombstones are records too
self._write_record(key, "")
...
The caller obligation is simple: call mayberotate() before writerecord(). This means a record can still push the file slightly past maxfilesize (the check is pre-write, not mid-write), but the next operation will trigger rotation. The file size limit is therefore a soft cap, not a hard one — any individual record can exceed the boundary by up to one record's worth of bytes.
self.openactive_file() — handles the mechanics of creating and registering the new file.self.active_file — a Python file object opened in "ab" mode, where .tell() reflects total bytes written.self.maxfilesize — set once at _init_, defaults to 10 MiB.1. tell() accuracy on append files: The code assumes tell() on an "ab" file returns the file's total size. This relies on the seek(0, 2) call in openactive_file and on the OS not buffering in a way that makes tell() lag behind actual writes. This works on POSIX but is technically implementation-dependent.
2. No concurrent writers: Nothing prevents two threads from both passing the size check and writing simultaneously. Bitcask's original design assumes a single writer process.
3. Sequential ID availability: The method blindly increments activefileid without checking whether a file with that ID already exists on disk. After compaction (which renumbers files), this could collide — though compact() updates activefileid to avoid this.