Function: truncatebefore in unbundled-database/unbundleddatabase.py

Date: 2026-05-29

Time: 06:39

WriteAheadLog.truncate_before

Purpose

This is a log compaction method. It discards WAL entries older than a given LSN (Log Sequence Number), reclaiming memory once those entries are no longer needed — typically after all consumers (derived systems) have processed past that point. This is the WAL equivalent of a checkpoint: once you know every downstream system has consumed through LSN *n*, entries before *n* are dead weight.

Contract

Parameters

| Parameter | Type | Meaning |

|-----------|------|---------|

| lsn | int | The cutoff. Entries with lsn strictly less than this value are discarded. Entries equal to or greater are kept. |

Edge cases:

Return Value

Returns an int — the count of entries removed. The caller can use this for logging/metrics but has no obligation to act on it.

Algorithm

1. Snapshot the current entry count (before).

2. Rebuild _entries via list comprehension, keeping only entries where e.lsn >= lsn.

3. Return the difference: before - len(self._entries).

This is O(n) in the number of entries — it scans the entire list. Since entries are ordered by LSN (they're appended sequentially), a bisect + slice would be O(log n + k) where k is the number retained, but the simplicity here is appropriate for a reference implementation.

Side Effects

Error Handling

None. No exceptions are raised. Passing a negative LSN or a non-integer would either silently do nothing (negative) or raise a TypeError from the comparison operator — but neither case is guarded.

Usage Patterns

Typical usage is checkpoint-driven truncation:


# Find the lowest position across all consumers
min_position = min(sys.position for sys in derived_systems)
# Safe to discard everything consumers have already processed
removed = wal.truncate_before(min_position)

The caller is responsible for ensuring no consumer still needs entries below the cutoff. Truncating too aggressively means a consumer that falls behind cannot catch up — it would need a full rebuild from the storage engine instead of incremental replay.

Dependencies