Topic: Compare the two implementations: MVCCDatabase uses activeatstart sets while SSIDatabase uses timestamp ordering — why the different approaches?

Date: 2026-05-29

Time: 10:10

Why activeatstart Sets vs. Timestamp Ordering

The difference comes down to when writes become visible in the shared store, which in turn dictates what metadata each transaction needs to reconstruct its snapshot.

MVCCDatabase: Eager Writes Require Active-Set Tracking

MVCCDatabase writes versions immediately into the shared versions dict, tagged only with the creating transaction's ID (mvccdatabase.py:148–155). At the moment of the write, the transaction hasn't committed yet — so the store contains a mix of committed, uncommitted, and aborted versions all jumbled together.

This creates an ambiguity that timestamps alone can't resolve. Consider: transaction 3 starts, then transaction 5 starts, then transaction 3 commits. When transaction 5 reads, it needs to see transaction 3's writes — but transaction 3 was still active when transaction 5 began. Without activeatstart, transaction 5 has no way to distinguish "committed before my snapshot" from "started before me but committed after me."

The visibility rule at mvcc_database.py:82–89 shows all three conditions:


if created_by in self._committed:
    if created_by not in tx.active_at_start:   # wasn't in-flight at my start
        if created_by < tx.tx_id:              # started before me
            created_visible = True

The activeatstart set is the mechanism that captures the state of the world at snapshot time. It's populated eagerly at begintransaction (mvccdatabase.py:66–68) and then never changes — it's a frozen snapshot of which transactions were in-flight.

SSIDatabase: Deferred Writes Make Timestamps Sufficient

SSIDatabase takes the opposite approach: writes are buffered in tx.writes (ssidatabase.py:152–155) and only materialized into store at commit time, stamped with the committimestamp (ssi_database.py commit path). The store therefore contains only committed data, and each version carries the exact timestamp at which it became durable.

This makes visibility trivial — visiblevalue at ssidatabase.py:63–73 just finds the latest version where committs <= snapshot_ts:


for commit_ts, value, tx_id in versions:
    if commit_ts <= snapshot_ts and (best is None or commit_ts > best[0]):
        best = (commit_ts, value, tx_id)

No active-set tracking needed. If a version is in the store, it's committed. If its commit timestamp is before your start timestamp, it's visible. The total ordering of timestamps resolves all ambiguity that the MVCC implementation needs the active set for.

Why Each Approach Fits Its Module

The choice isn't arbitrary — it aligns with what each module is demonstrating:

The tradeoff: activeatstart sets grow with concurrency (one entry per in-flight transaction) but give precise per-version visibility. Timestamp ordering is O(1) for visibility but requires all writes to be deferred until commit, which means reads can't see own-writes without consulting the buffer first (ssi_database.py:93–97).

Topics to Explore

Beliefs