block_size affects the sparse index and binary search within blocks — the default of 4 in tests is suspiciously small and worth understandingDate: 2026-05-29
Time: 08:57
The observations cut off mid-function at line 200. I have enough from the observations to explain the core mechanism, but I'll note what's missing.
sstable-and-compaction/sstable.py:SSTableReader.get — The binary search + block scan was cut off at line 200; read the full implementation to see how it handles boundary conditions and the transition from index lookup to linear scanlog-structured-merge-tree/lsm.py:scanrangeforkey — The linear scan within a block, including the early-exit optimization when the current key exceeds the targetsstable-and-compaction/sstable.py:SSTableReader.range_scan — Range scans use the same sparse index for the starting position but have different termination logic; worth comparing with point lookupsindex-size-vs-scan-cost-benchmarking — Run the testlargedataset test with different block_size values (4, 16, 64, 256) and measure actual lookup latency to see where the O(log(N/B) + B) tradeoff inflectslog-structured-merge-tree/lsm.py — Compare how LSMTree wires sparseindexinterval through to SSTable construction (line 203-207) and whether the interval is configurable per-level or globalblock-size-controls-index-density — blocksize (or sparseindexinterval) determines that every Nth entry is recorded in the sparse index; the index contains exactly ceil(entrycount / block_size) entriessstable-lookup-is-two-phase — Point lookups in both SSTable implementations first binary-search the sparse index to identify a block, then linearly scan entries within that block's byte rangetest-block-size-differs-from-default — Tests universally use block_size=4 while the SSTableWriter default is 64; this ensures tests exercise multi-block index paths with small datasetstwo-sstable-implementations-same-pattern — lsm.py (using sparseindexinterval=16 and bisect) and sstable.py (using block_size=64 and manual binary search) implement the same sparse-index-with-block-scan pattern with different defaults and naming