Function: prune in vector-clocks/vector_clock.py

Date: 2026-05-29

Time: 14:09

VectorClock.prune(max_nodes) — Bounded Vector Clock Truncation

Purpose

prune caps the size of a vector clock by discarding entries with the lowest counters, keeping only the max_nodes most-active nodes. This exists because vector clocks grow unboundedly as new nodes participate in the system — every unique node ID that ever increments the clock adds an entry. In long-running systems with node churn (nodes joining and leaving), the clock can grow without limit. Pruning trades causal accuracy for bounded space: you lose the ability to detect causality relationships involving the dropped nodes, but the clock stays a fixed size.

This is the strategy described in DDIA and used by systems like Riak — when the clock gets too large, drop the least-active entries on the assumption that nodes with low counters are the least relevant to ongoing causal ordering.

Contract

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| max_nodes | int | Upper bound on the number of node entries to retain. If the clock already has this many or fewer entries, it's returned as-is (copied). |

Edge cases: If maxnodes >= len(self.clock), no pruning occurs. If multiple nodes share the same counter value, which ones survive the cut is determined by Python's sorted stability (preserving original dict iteration order among ties), but this is not a guarantee the caller should rely on.

Return Value

A new VectorClock instance. The caller gets back a clock that may have lost causal information — any subsequent compare() or dominates() call against another clock that references pruned nodes will behave as though those nodes were never seen (counter = 0 via get()'s default). This can cause false concurrency or false dominance.

Algorithm

1. Short-circuit: If the clock already fits within max_nodes, copy the internal dict into a new VectorClock and return. This avoids the sort.

2. Sort: Convert _clock.items() to a list sorted by counter value descending. Nodes that have been incremented the most sort first.

3. Truncate: Slice the first max_nodes entries and construct a new VectorClock from them. The remaining entries are silently dropped.

Side Effects

None. The method is pure — no mutation of self, no I/O, no state changes. Consistent with the class's immutable design (every mutating operation returns a new instance).

Error Handling

No explicit error handling. Relies on Python built-ins raising naturally:

Usage Patterns

Pruning is typically applied periodically or at replication boundaries — e.g., before sending a vector clock over the wire, or when a store detects the clock has grown past a threshold. A typical call site might look like:


vc = vc.increment(node_id)
if len(vc._clock) > MAX_VC_SIZE:
    vc = vc.prune(MAX_VC_SIZE)

The caller must accept that pruning is lossy. After pruning, causal relationships involving dropped nodes become undetectable, which can lead to false conflict detection (concurrent when it should be ordered) or, worse, silent overwrites (ordered when it should be concurrent).

Dependencies

Only the VectorClock class itself (constructor and sorted built-in). No external modules.

Beliefs