Original source: write-ahead-log-wal-truncate

Function: truncate in write-ahead-log/wal.py

Date: 2026-05-28

Time: 18:46

`WriteAheadLog.truncate`

Purpose

truncate is the WAL's garbage collection mechanism. After a database checkpoint (where WAL contents have been flushed to the main data file), records that are now durable elsewhere no longer need crash-recovery protection. This method removes those records, reclaiming disk space and keeping replay times short.

In DDIA terms, this is the "log compaction" step — without it, the WAL grows unbounded.

Contract

Precondition: uptoseq should be a sequence number that the caller knows is safely persisted elsewhere (e.g., after a successful checkpoint/compaction of the main data store). Passing a sequence number for records that haven't been persisted elsewhere means those records are gone forever.
Postcondition: No record with seqnum <= upto_seq remains in any WAL file. WAL files that become empty are deleted entirely. The WAL is ready for new appends.
Invariant: The lock is held for the entire operation — no concurrent reads or writes can interleave.

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| uptoseq | int | Inclusive upper bound. Every record with a sequence number at or below this value is discarded. Passing 0 is a no-op (no records have seq 0). Passing currentseqnum() deletes everything. |

There's no validation that uptoseq is sane — passing a value beyond currentseqnum() silently deletes all records without error.

Return Value

None. Success is silent. The caller infers correctness by the absence of exceptions.

Algorithm

Step 1 — Close the active file descriptor.

Flush pending writes and fsync to ensure every buffered record hits disk before we start rewriting files. The fd is set to None so the rewrite loop doesn't conflict with the open append handle.

Step 2 — Iterate every .wal file in sorted order.

For each file:

1. Read all records, collecting those with seqnum > upto_seq into kept.

2. If kept is empty, the entire file is obsolete — delete it with os.remove.

3. If kept is non-empty, rewrite the file in place ("wb" mode) with only the surviving records, then flush + fsync to ensure durability.

Step 3 — Reopen for appending.

openlatest() either reopens the last surviving file (if under maxfilesize) or creates a new one via _rotate().

The key detail: this is not an atomic operation across files. If the process crashes mid-truncate, some files may be fully truncated while others are untouched. However, the data is still consistent — records that survived the crash are still valid, and the next truncate call will finish the job.

Side Effects

Disk I/O: Reads, rewrites, and potentially deletes multiple WAL files. Each surviving file gets an fsync.
File descriptor lifecycle: Closes the current fd before rewriting, then reopens via openlatest(). Any external reference to self._fd would break (though the lock prevents concurrent access).
State mutation: self.fd and self.currentfile are both updated as a consequence of open_latest().

Error Handling

CRC mismatches / corrupt records: readrecord raises ValueError on CRC failure. The except ValueError: break stops reading that file — records after the corruption point are silently lost, even if they have seqnum > upto_seq. This is a deliberate design choice: corruption marks the end of trustworthy data.
Filesystem errors: OSError from file operations (open, os.remove, os.fsync) propagate uncaught. If the process crashes after deleting some files but before rewriting others, the WAL is in a partially truncated state — recoverable but potentially missing the records from deleted files.
No rollback: If rewriting a file fails mid-write (disk full, permissions), the file may be left truncated or empty. There's no temp-file-then-rename pattern here.

Usage Patterns

Typical call site:


# After successfully checkpointing the database state
checkpoint_seq = wal.checkpoint()
# ... flush main data store to disk ...
wal.truncate(checkpoint_seq)

Caller obligations:

1. Only truncate what's durable elsewhere. This is the caller's responsibility — truncate doesn't verify that records have been applied.

2. Don't call concurrently with append. The internal lock handles this, but callers should understand that truncate blocks all writes for its duration, which can be significant for large WALs.

Dependencies

| Dependency | Usage |

|------------|-------|

| os | File operations: remove, fsync, makedirs |

| struct | Binary record encoding via encoderecord |

| zlib | CRC32 checksums inside encoderecord / readrecord |

| threading.Lock | Mutual exclusion with append, replay, etc. |

| readrecord | Module-level function for deserializing binary WAL records |

| encoderecord | Module-level function for serializing records back to binary |

| openlatest | Instance method to resume appending after rewrite |

| walfiles | Instance method returning sorted list of WAL file paths |