Original source: topic-merkle-proof-security-model

Topic: How Merkle proofs provide O(log N) verification and what trust assumptions the verifier must satisfy (authentic root hash)

Date: 2026-05-29

Time: 11:47

How Merkle Proofs Provide O(log N) Verification

The Core Idea

A Merkle proof lets you verify that a specific data block belongs to a dataset without having the entire dataset — you only need the root hash and a small set of sibling hashes. The proof size and verification cost are both O(log N) because the tree is a complete binary tree, and verification only walks one path from leaf to root.

Proof Generation: `getproof` (merkletree.py:139–160)

The proof is a list of sibling hashes collected while walking from the target leaf up to the root:


while arr_idx > 0:
    parent = (arr_idx - 1) // 2
    if arr_idx == 2 * parent + 1:
        sibling_idx = 2 * parent + 2
        siblings.append((self._hashes[sibling_idx], "right"))
    else:
        sibling_idx = 2 * parent + 1
        siblings.append((self._hashes[sibling_idx], "left"))
    arr_idx = parent

At each level, the algorithm records the hash of the other child of the current node's parent, along with its position ("left" or "right"). The position matters because H(A || B) ≠ H(B || A) — hash concatenation is order-dependent.

For a tree with N leaves, the padded size is the next power of 2, and the height is log₂(paddedsize). The loop executes exactly height times, producing exactly height sibling entries. With 4 leaves, height == 2 (line 97–103), so proofs contain 2 siblings — confirmed by the test at testmerkle_tree.py:50:


assert len(proof.siblings) == 2  # height=2

Proof Verification: `verifyproof` (merkletree.py:162–173)

Verification recomputes the root hash using only the data, the sibling list, and the claimed root:


current = _sha256(data)
if current != proof.leaf_hash:
    return False
for sibling_hash, direction in proof.siblings:
    if direction == "left":
        current = _sha256((sibling_hash + current).encode())
    else:
        current = _sha256((current + sibling_hash).encode())
return current == proof.root_hash

The verifier:

1. Hashes the raw data and checks it matches the claimed leaf hash.

2. Walks up the tree: at each level, combines the current hash with the sibling hash in the correct order, producing the parent hash.

3. After height iterations, compares the computed value against proof.root_hash.

This is O(log N) — exactly one hash per tree level, no recursion, no searching.

Why O(log N) and Not O(N)

Without a Merkle tree, verifying membership requires hashing all N blocks to recompute the root. With a proof, you only need the log₂(N) sibling hashes along the single path from the leaf to the root. Everything else is already summarized by those siblings. The array-backed structure (merkletree.py:57–75) makes this concrete: the tree has 2 * paddedsize - 1 total nodes, but verification touches only height of them.

The Trust Assumption: Authentic Root Hash

The critical line is the final comparison in verify_proof:


return current == proof.root_hash

The roothash comes from the MerkleProof object itself (merkletree.py:44), which was generated by the prover. The verifier must already possess a trusted root hash obtained through an independent channel. If the verifier simply trusts whatever proof.root_hash the prover sends, a malicious prover can construct a valid proof for any data against a fabricated root.

In practice, root hashes are:

Published in a block header (blockchains)
Exchanged during an anti-entropy protocol (Dynamo-style replication — exactly the use case suggested by this module's docstring: "data verification and anti-entropy")
Signed by a trusted party

The verifyproof static method is deliberately detached from any MerkleTree instance — it takes raw data and a MerkleProof struct. This reflects the real-world separation: the verifier doesn't have the tree, only the root hash and the proof. But the API as written bundles roothash inside MerkleProof, so the caller must separately validate that proof.roothash matches a trusted value. The test at testmerkle_tree.py:49 confirms against the tree's own root, but in production the root would come from elsewhere.

Topics to Explore

[function] merkle-tree/merkle_tree.py:diff — Uses the same tree structure for O(log N) anti-entropy diffing between replicas, the primary distributed-systems use case
[function] merkle-tree/merkletree.py:updateleaf — Shows how a single leaf change triggers O(log N) rehashing up to root, maintaining proof validity after mutations
[general] proof-of-non-inclusion — This implementation only supports inclusion proofs; sorted Merkle trees can prove a key is *absent*, which is worth understanding
[file] merkle-tree/merkle_tree.py — The KeyRangeMerkleTree class (lines 200+) wraps the tree for key-value anti-entropy, bridging the gap to Dynamo-style replication
[general] second-preimage-attacks — Why the concatenation order in verify_proof matters for security, and why some implementations use a domain separator byte for leaf vs. internal nodes

Beliefs

merkle-proof-size-is-log-n — A Merkle proof contains exactly height sibling hashes, where height = log₂(nextpowerof2(leafcount)), making proof size and verification O(log N)
verify-proof-is-stateless — verify_proof is a static method that requires no tree instance — only raw data and a MerkleProof — reflecting the real-world constraint that verifiers don't hold the full dataset
root-hash-trust-is-external — verifyproof checks the recomputed hash against proof.roothash but does not authenticate the root itself; the caller must obtain a trusted root through an independent channel
sibling-direction-matters — Proof siblings are tagged "left" or "right" because SHA-256 concatenation is order-dependent; swapping the order produces a different hash and verification fails
padding-leaves-use-empty-hash — Non-power-of-2 leaf counts are padded with EMPTY_HASH (SHA-256 of empty bytes), which are included in proof paths but do not represent real data