Date: 2026-05-29
Time: 08:02
WriteAheadLog.checkpoint()checkpoint writes a special marker record into the WAL that says "all data up to this point has been flushed to the primary storage." It exists so that during recovery, the system knows which WAL records can be safely skipped — anything before the most recent checkpoint is already durable in the main data store. This is the classic WAL checkpoint pattern from DDIA Chapter 3: the WAL grows unboundedly without checkpoints, and truncation needs a safe cutoff point.
self._fd is not None). There's no guard against calling this on a closed WAL — it will raise AttributeError.CHECKPOINT record with a new, unique sequence number is durably written to disk. The caller can later pass this sequence number to truncate() to discard everything at or before it.None. The method takes no arguments — it unconditionally records "now" as a checkpoint boundary.
Returns int — the sequence number assigned to this checkpoint record. The caller is expected to store this somewhere (e.g., alongside the flushed state of the primary storage) so it can later be passed to truncate(uptoseq) or replay(after_seq).
1. Acquire the lock — serializes against concurrent append, append_batch, truncate, and other checkpoint calls.
2. Increment seqnum — claims the next sequence number. This is the same counter used by all record types, so checkpoint records participate in the global total order.
3. Encode and write — calls encoderecord with OP_CHECKPOINT, empty key and value. The record still gets a CRC, length header, and full binary framing — same format as any other record, just with zero-length payloads.
4. Force sync — calls dosync(force=True), which unconditionally flushes the userspace buffer and calls os.fsync(). The force=True bypasses the batch-sync counter — a checkpoint *must* be durable before the caller acts on it.
5. Maybe rotate — if the current WAL file now exceeds maxfile_size, opens a new segment file. This happens after the sync, so the checkpoint is guaranteed durable before rotation.
6. Return the sequence number — the caller uses this as the "safe to discard up to" marker.
fsync. This is the expensive part — fsync blocks until the kernel confirms the data is on stable storage.seqnum (permanently; there's no rollback).currentfile and _fd.There is no explicit error handling. If fd.write() or os.fsync() raises an OSError (disk full, I/O error), it propagates to the caller while the lock is held — the with self.lock block releases the lock on exception unwind, but seqnum has already been incremented, leaving a gap in the sequence. This is safe (gaps don't break replay) but worth knowing.
Typical usage pairs checkpoint with truncate:
# After flushing memtable/SSTable to disk:
seq = wal.checkpoint()
# Persist `seq` alongside the flushed state
wal.truncate(seq) # discard everything up to and including the checkpoint
The caller is responsible for ensuring the primary storage is actually durable *before* calling checkpoint — otherwise recovery could skip records that weren't actually persisted. The WAL itself doesn't enforce this ordering; it's a protocol obligation.
encoderecord — shared binary encoder for all record typesdosync — abstraction over sync modes (sync, batch, none)mayberotate — file segment managementthreading.Lock — concurrency controlos.fsync — kernel-level durability guaranteeThe method notably does *not* interact with replay() or truncate() directly — the coordination happens through the sequence number returned to the caller.
checkpoint-force-syncs — checkpoint() always calls fsync regardless of the WAL's sync_mode setting, because checkpoint durability is a correctness requirement not a performance knobcheckpoint-shares-sequence-space — Checkpoint records consume sequence numbers from the same monotonic counter as PUT, DELETE, and COMMIT records, maintaining a single total order across all operationscheckpoint-empty-payload — Checkpoint records carry zero-length key and value fields; the record's significance is entirely in its op-type and sequence numbercheckpoint-filtered-from-replay — replay() only returns PUT and DELETE records, so checkpoint records are never visible to recovery consumers — they serve as truncation boundaries, not data