table/merger.cc implements the canonical heap-based k-way merge; comparing it to this Python impl reveals what production correctness requires (direction reversal, snapshot reads, seek optimization)Date: 2026-05-29
Time: 07:32
I'll work with the observations already gathered.
leveldb-merger.cc — Read LevelDB's actual table/merger.cc source to see the direction enum, FindSmallest/FindLargest helper methods, and how it rebuilds the heap on direction changelog-structured-merge-tree/lsm.py:rangescan — The full rangescan method (around lines 265–300) to see how the priority-based dict merge works and why it doesn't scale to large datasetssstable-and-compaction/sstable.py — The compaction merge logic (lines 200+) likely uses heapq.merge with timestamps, showing a more sophisticated merge than the LSM tree's dict approachwrite-skew-detection/ssidatabase.py — The visiblevalue and snapshot methods (lines 59–80) demonstrate the snapshot isolation that a production merge iterator would needrocksdb-merge-operator — RocksDB extends LevelDB's merge with a MergeOperator that supports partial merges during compaction — the next step beyond what both implementations showlsm-range-scan-materializes-all — LSMTree.range_scan (lsm.py ~lines 275–298) loads all matching entries from all SSTables into a dict before returning, making it O(total entries) regardless of range sizelsm-no-snapshot-isolation — The LSM tree implementation has no sequence numbers or snapshot mechanism; concurrent flushes during a range scan can produce inconsistent readslsm-forward-only-iteration — SSTable iterators (scan, scan_all) support only forward iteration; there is no Prev() or reverse scan capabilitysstable-has-timestamps-lsm-ignores-them — SSTableEntry in sstable.py carries a timestamp field, but lsm.py's SSTable format uses only structural ordering (SSTable sequence number) for conflict resolutionlsm-compaction-uses-heap-merge — Compaction (lsm.py ~line 323) uses heapq-based k-way merge with prev_key deduplication, while range scan uses a simpler dict-based merge — two different strategies in the same codebase