Date: 2026-05-29
Time: 11:48
MerkleTree.diffdiff identifies which leaf positions differ between two Merkle trees. This is the core operation that makes Merkle trees useful for anti-entropy in distributed systems: two replicas each build a tree over their data, then diff the roots. If roots match, all data is identical — no further communication needed. If they don't, the diff drills down to find exactly which leaves diverged, transferring only O(log N) hashes instead of comparing every record.
Preconditions:
paddedsize (i.e., the same power-of-2 array dimension). This is enforced with a ValueError.Postconditions:
Invariant: If self.roothash == other.roothash, the result is always an empty list. The recursive helper short-circuits at the root and never descends.
| Parameter | Type | Description |
|-----------|------|-------------|
| other | MerkleTree | The tree to compare against. Must have been constructed with the same number of padded slots (same power-of-2 rounding). |
Note: two trees can have different leafcount values and still be diffable, as long as their paddedsize matches. For example, a tree with 5 leaves and one with 7 leaves both pad to 8, so they're comparable. The extra slots in the 5-leaf tree will have EMPTY_HASH, and those positions will show up as diffs if the 7-leaf tree has data there.
List[int] — indices into the original data blocks where the two trees disagree. Sorted in ascending order (as a consequence of the left-to-right DFS in diffrecursive). The caller is responsible for interpreting what the indices mean — diff doesn't return the actual data, just the positions.
1. Guard: Reject trees with different paddedsize — their array layouts are incompatible.
2. Recursive descent via diffrecursive(other, 0, result), starting at the root (index 0):
self.hashes[idx] == other.hashes[idx], the entire subtree matches. Return immediately — this is the pruning that gives O(k log N) performance where k is the number of differing leaves.idx is at or past leafstart (i.e., paddedsize - 1), we've hit a leaf. Append the leaf's logical index (idx - leafstart) to result.2*idx + 1 (left) and 2*idx + 2 (right).3. Filter padding: The recursion may report diffs in padding positions (indices ≥ both leafcount values). The final list comprehension strips these out using max(self.leafcount, other.leafcount) as the upper bound. It uses max, not min — if tree A has 5 leaves and tree B has 7, positions 5 and 6 are real diffs (B has data that A doesn't), so they should be reported.
None. The method is purely read-only — it doesn't mutate either tree. The result list is local to the call.
ValueError: Raised when paddedsize differs between trees. This is the only explicit error. The method does not catch or swallow any exceptions.diffrecursive — this is safe because both trees have identically-sized hashes arrays (guaranteed by the paddedsize check) and the recursion stays within [0, 2*padded_size - 2].Typical use is through KeyRangeMerkleTree.diff_keys, which translates leaf indices back to key names:
tree_a = KeyRangeMerkleTree(replica_a_pairs)
tree_b = KeyRangeMerkleTree(replica_b_pairs)
stale_keys = tree_a.diff_keys(tree_b) # internally calls self._tree.diff(other._tree)
Caller obligation: the two trees must represent the same key-space partitioned identically. If replicas use different sort orders or different key sets, the positional indices are meaningless. KeyRangeMerkleTree handles this by sorting keys before construction.
diffrecursive: private helper that does the actual traversal. Accesses hashes arrays on both trees and padded_size on self.[0, paddedsize-2], leaves at [paddedsize-1, 2*padded_size-2], children of node i at 2i+1 and 2i+2. This is a standard implicit binary heap layout.leafcount is assumed to be ≤ paddedsize on both trees. Violated values would cause the filter to include padding slots or exclude real leaves.max in the filter assumes that a leaf index valid in *either* tree represents real data. If one tree has fewer leaves, those positions in the shorter tree are EMPTY_HASH padding — but the diff correctly flags them because the other tree has actual data there.merkle-diff-requires-same-padded-size — diff raises ValueError if the two trees have different paddedsize, meaning both must round to the same power of 2merkle-diff-filters-padding — The diff result excludes padding indices beyond max(self.leafcount, other.leafcount), so only real data positions are returnedmerkle-diff-is-pure — diff and diffrecursive are read-only; they mutate neither tree and produce no side effectsmerkle-diff-order-is-ascending — Differing leaf indices are returned in ascending order as a consequence of left-to-right DFS traversalmerkle-diff-asymmetric-leaf-counts — Two trees with different leafcount but the same paddedsize can be diffed; positions where one tree has data and the other has EMPTY_HASH padding are reported as diffs