Date: 2026-05-29
Time: 10:09
MVCCDatabase.garbage_collectThis is the MVCC vacuum — it reclaims storage by removing version records that no active transaction can ever read. In an MVCC system, every write creates a new Version object rather than overwriting in place, so the version list for a key grows without bound unless something prunes it. This method is that pruning pass.
It exists because snapshot isolation requires keeping old versions alive as long as any transaction might need them for its consistent snapshot. Once every transaction that could see a version has ended, the version is dead weight.
Preconditions: None enforced. Can be called at any time, even during active transactions. Does not require (or acquire) any lock — this is safe only because the implementation is single-threaded.
Postconditions:
Version objects were dropped.Invariant preserved: After GC, isvisible(tx, v) still returns the same result for every remaining version v and every active transaction tx. GC never removes a version that an active transaction could read.
None. It operates entirely on instance state.
Returns int — the total number of Version objects removed across all keys. The caller can use this to decide whether GC was productive (e.g., log it, or skip the next GC cycle if removed == 0).
The method processes each key independently:
versions = [v for v in versions if v.created_by not in self._aborted]
Versions from aborted transactions are invisible to everyone (per isvisible), so they're unconditionally dropped. This is always safe.
When active_txs is empty, no snapshot references exist, so only the "current truth" matters.
It iterates the version list looking for the last version where:
created_by is committed, ANDBecause it iterates forward and overwrites latest, it picks the *last* such version in list order — which is the newest, since versions are appended chronologically.
If a latest surviving version is found, everything else is discarded. If *all* versions are committed-deleted, they're all removed and the key is deleted from _versions.
With active transactions, the method must preserve any version that *any* active transaction might read. The key insight is the mintxid — the oldest active transaction:
min_tx_id = min(t.tx_id for t in active_txs)
If a version was committed *and* superseded (deleted or replaced) before mintxid, then *every* active transaction sees the superseding version, not this one. It's safe to remove.
Two removal conditions:
1. Explicitly deleted: deletedby is committed and < mintx_id. The deletion is visible to all active transactions, so the deleted version is invisible to all of them.
2. Zombie (superseded): The version is committed, not explicitly deleted, but a *newer* committed version exists (also < mintxid). The newer version shadows this one for every active transaction. The any(...) check scans committedbeforeoldest to find such a superseding version.
Everything else goes into to_keep.
If versions remain, the list is replaced. If empty, the key is deleted from _versions entirely to avoid accumulating empty lists.
Mutates self.versions — removes entries and replaces version lists. Does not touch committed, aborted, transactions, or any Transaction object. This is a pure storage compaction pass.
None. No exceptions are raised or caught. The method assumes committed, aborted, and versions are in a consistent state. If they're not (e.g., a txid appears in both committed and aborted), the behavior is silently wrong — no invariant checks.
Typically called between transaction batches or on a periodic schedule:
db = MVCCDatabase()
# ... run some transactions, commit/abort them ...
removed = db.garbage_collect()
Caller obligations: The caller should ensure no concurrent modifications to _versions while GC runs. In this single-threaded implementation that's trivially satisfied, but it would be a problem in a concurrent setting.
GC is idempotent — calling it twice with no intervening state changes removes 0 versions the second time.
Relies entirely on instance state:
self._versions — the version store being compactedself._transactions — to find active transactionsself.committed / self.aborted — to classify transaction outcomesVersion.createdby, Version.deletedby — to determine version provenanceNo external modules. No I/O.
1. Versions are appended in tx_id order. The no-active-transactions path picks latest by iterating forward, so the last match wins. If versions were out of order, this would pick the wrong "latest."
2. committedbeforeoldest is computed but partially unused. The variable is built and used in the zombie check's any(...), but it's not used for the explicit-deletion check. This is correct but means the variable serves a narrower purpose than its name suggests.
3. deletedby semantics are overloaded. A version with deletedby set could mean either an explicit delete() call or a soft-delete from a conflicting write. GC treats both the same, which is correct.
4. Single-threaded execution. There is no locking. If garbagecollect ran concurrently with read or write, you'd get races on the versions dict and its lists.