Topic: How Merkle proofs provide O(log N) verification and what trust assumptions the verifier must satisfy (authentic root hash)

Date: 2026-05-29

Time: 11:47

How Merkle Proofs Provide O(log N) Verification

The Core Idea

A Merkle proof lets you verify that a specific data block belongs to a dataset without having the entire dataset — you only need the root hash and a small set of sibling hashes. The proof size and verification cost are both O(log N) because the tree is a complete binary tree, and verification only walks one path from leaf to root.

Proof Generation: getproof (merkletree.py:139–160)

The proof is a list of sibling hashes collected while walking from the target leaf up to the root:


while arr_idx > 0:
    parent = (arr_idx - 1) // 2
    if arr_idx == 2 * parent + 1:
        sibling_idx = 2 * parent + 2
        siblings.append((self._hashes[sibling_idx], "right"))
    else:
        sibling_idx = 2 * parent + 1
        siblings.append((self._hashes[sibling_idx], "left"))
    arr_idx = parent

At each level, the algorithm records the hash of the other child of the current node's parent, along with its position ("left" or "right"). The position matters because H(A || B) ≠ H(B || A) — hash concatenation is order-dependent.

For a tree with N leaves, the padded size is the next power of 2, and the height is log₂(paddedsize). The loop executes exactly height times, producing exactly height sibling entries. With 4 leaves, height == 2 (line 97–103), so proofs contain 2 siblings — confirmed by the test at testmerkle_tree.py:50:


assert len(proof.siblings) == 2  # height=2

Proof Verification: verifyproof (merkletree.py:162–173)

Verification recomputes the root hash using only the data, the sibling list, and the claimed root:


current = _sha256(data)
if current != proof.leaf_hash:
    return False
for sibling_hash, direction in proof.siblings:
    if direction == "left":
        current = _sha256((sibling_hash + current).encode())
    else:
        current = _sha256((current + sibling_hash).encode())
return current == proof.root_hash

The verifier:

1. Hashes the raw data and checks it matches the claimed leaf hash.

2. Walks up the tree: at each level, combines the current hash with the sibling hash in the correct order, producing the parent hash.

3. After height iterations, compares the computed value against proof.root_hash.

This is O(log N) — exactly one hash per tree level, no recursion, no searching.

Why O(log N) and Not O(N)

Without a Merkle tree, verifying membership requires hashing all N blocks to recompute the root. With a proof, you only need the log₂(N) sibling hashes along the single path from the leaf to the root. Everything else is already summarized by those siblings. The array-backed structure (merkletree.py:57–75) makes this concrete: the tree has 2 * paddedsize - 1 total nodes, but verification touches only height of them.

The Trust Assumption: Authentic Root Hash

The critical line is the final comparison in verify_proof:


return current == proof.root_hash

The roothash comes from the MerkleProof object itself (merkletree.py:44), which was generated by the prover. The verifier must already possess a trusted root hash obtained through an independent channel. If the verifier simply trusts whatever proof.root_hash the prover sends, a malicious prover can construct a valid proof for any data against a fabricated root.

In practice, root hashes are:

The verifyproof static method is deliberately detached from any MerkleTree instance — it takes raw data and a MerkleProof struct. This reflects the real-world separation: the verifier doesn't have the tree, only the root hash and the proof. But the API as written bundles roothash inside MerkleProof, so the caller must separately validate that proof.roothash matches a trusted value. The test at testmerkle_tree.py:49 confirms against the tree's own root, but in production the root would come from elsewhere.

Topics to Explore

Beliefs