appendbatch writes a buffer that pushes past maxfile_size — does the batch stay atomic or can it split?Date: 2026-05-29
Time: 08:09
The batch stays atomic. It can never split across files.
The key is the ordering of operations inside append_batch at write-ahead-log/wal.py:153:
def append_batch(self, operations: List[Tuple[str, str, str]]) -> int:
with self._lock:
buf = bytearray()
for op_type, key, value in operations:
self._seq_num += 1
buf.extend(_encode_record(...))
self._seq_num += 1
commit_seq = self._seq_num
buf.extend(_encode_record(commit_seq, OP_COMMIT, b"", b""))
self._fd.write(bytes(buf)) # 1. write entire batch
self._do_sync(force=True) # 2. fsync
self._maybe_rotate() # 3. THEN check rotation
return commit_seq
Three things guarantee atomicity:
All operation records plus the COMMIT marker are assembled into one bytearray, then flushed with a single self._fd.write(bytes(buf)). There's no per-record rotation check inside the loop. The buffer can be arbitrarily large — it doesn't matter.
mayberotate() (line 136) runs after the write and sync complete:
def _maybe_rotate(self):
if self._fd and self._fd.tell() >= self._max_file_size:
self._rotate()
This means a WAL file can grow beyond maxfilesize if a batch pushes it past the limit. The size cap is soft, not hard. The file only rotates on the *next* operation after the batch finishes.
The entire method runs under self._lock, so no other thread can sneak a rotation or a competing write between the batch records.
appendThe single-record append at line 141 follows the same write-then-rotate pattern, but for individual records it's less consequential — each record is self-contained. The batch case is where this ordering decision actually matters for correctness.
Bitcask (hash-index-storage/bitcask.py:169) calls mayberotate() before each put, which means consecutive puts can land in different files. That's fine for Bitcask since it doesn't need batch semantics — each key-value pair is independently addressable via the in-memory keydir.
If the process crashes *during* self.fd.write(bytes(buf)) — after some bytes hit disk but before the COMMIT record is written — recovery code should treat the batch as incomplete. The COMMIT record (OPCOMMIT) at the end of the buffer serves as the durability marker: if it's absent, the batch is discarded during replay. This is the standard WAL pattern from DDIA Chapter 3.
Because openlatest (line 100) checks os.path.getsize(last) < self.maxfile_size on startup, a file that was pushed past the limit by a batch will trigger a new file on the next open. The overshoot is tolerated for exactly one batch's worth.