Function: checkpoint in write-ahead-log/wal.py

Date: 2026-05-29

Time: 08:02

WriteAheadLog.checkpoint()

Purpose

checkpoint writes a special marker record into the WAL that says "all data up to this point has been flushed to the primary storage." It exists so that during recovery, the system knows which WAL records can be safely skipped — anything before the most recent checkpoint is already durable in the main data store. This is the classic WAL checkpoint pattern from DDIA Chapter 3: the WAL grows unboundedly without checkpoints, and truncation needs a safe cutoff point.

Contract

Parameters

None. The method takes no arguments — it unconditionally records "now" as a checkpoint boundary.

Return Value

Returns int — the sequence number assigned to this checkpoint record. The caller is expected to store this somewhere (e.g., alongside the flushed state of the primary storage) so it can later be passed to truncate(uptoseq) or replay(after_seq).

Algorithm

1. Acquire the lock — serializes against concurrent append, append_batch, truncate, and other checkpoint calls.

2. Increment seqnum — claims the next sequence number. This is the same counter used by all record types, so checkpoint records participate in the global total order.

3. Encode and write — calls encoderecord with OP_CHECKPOINT, empty key and value. The record still gets a CRC, length header, and full binary framing — same format as any other record, just with zero-length payloads.

4. Force sync — calls dosync(force=True), which unconditionally flushes the userspace buffer and calls os.fsync(). The force=True bypasses the batch-sync counter — a checkpoint *must* be durable before the caller acts on it.

5. Maybe rotate — if the current WAL file now exceeds maxfile_size, opens a new segment file. This happens after the sync, so the checkpoint is guaranteed durable before rotation.

6. Return the sequence number — the caller uses this as the "safe to discard up to" marker.

Side Effects

Error Handling

There is no explicit error handling. If fd.write() or os.fsync() raises an OSError (disk full, I/O error), it propagates to the caller while the lock is held — the with self.lock block releases the lock on exception unwind, but seqnum has already been incremented, leaving a gap in the sequence. This is safe (gaps don't break replay) but worth knowing.

Usage Patterns

Typical usage pairs checkpoint with truncate:


# After flushing memtable/SSTable to disk:
seq = wal.checkpoint()
# Persist `seq` alongside the flushed state
wal.truncate(seq)  # discard everything up to and including the checkpoint

The caller is responsible for ensuring the primary storage is actually durable *before* calling checkpoint — otherwise recovery could skip records that weren't actually persisted. The WAL itself doesn't enforce this ordering; it's a protocol obligation.

Dependencies

The method notably does *not* interact with replay() or truncate() directly — the coordination happens through the sequence number returned to the caller.

Beliefs