Date: 2026-05-29
Time: 12:32
Both LSM tree implementations in this repo trigger compaction synchronously and unconditionally — the exact opposite of time-window-aware scheduling.
log-structured-merge-tree/lsm.pyThe LSMTree constructor at line 202-208 accepts a compaction_threshold (default 4). Compaction fires inline during writes:
# line 315-317
if len(self._sstables) >= self._compaction_threshold:
self.compact()
This means a put() or delete() call that happens to tip the SSTable count over the threshold will block the caller while compact() runs a full k-way merge (line 319, line 323). There's no notion of "now is a bad time" — the write that triggers threshold just pays the cost.
sstable-and-compaction/sstable.pyThe CompactionManager (visible in tests at testsstable.py:51-56) uses the same pattern: needscompaction() checks a count threshold (minthreshold=2 for size-tiered, l0compactiontrigger=2 for leveled), and runcompaction() executes immediately when called. The strategy selection (size-tiered vs. leveled) controls *which* SSTables merge, but not *when* the merge happens relative to system load.
Kleppmann's discussion of LSM compaction highlights a fundamental tension: compaction competes with foreground reads and writes for disk I/O bandwidth. In production systems like Cassandra and RocksDB, this manifests as:
1. Write stalls — if compaction can't keep up, the number of L0 SSTables grows, reads slow down (more files to check), and eventually writes must be throttled or blocked.
2. Latency spikes — a large compaction running during peak traffic causes p99 latency to spike because the disk is saturated with merge I/O.
Time-window-based scheduling addresses this by deferring non-urgent compaction to off-peak hours (e.g., overnight), rate-limiting compaction I/O during peak periods, or using separate I/O priorities. Cassandra's compactionthroughputmbpersec and RocksDB's rate_limiter are real-world examples.
Neither implementation has any of the following, all of which would be needed for production-viable compaction scheduling:
time|window|peak|load|schedule across the repo (greptimewindow) returns zero hits in either LSM module.compact() methods run synchronously in the caller's thread. A production system would run compaction on a background thread with configurable priority.lsm.py:316 is a hard gate: threshold met → compact now.For a teaching implementation, immediate compaction is fine — it demonstrates the merge logic clearly. But any team porting this to production must understand that the *scheduling policy* around compaction is as important as the merge algorithm itself. Without it:
This is a case where the algorithm is correct but the operational envelope is absent.
log-structured-merge-tree/lsm.py:compact — The synchronous k-way merge that blocks writers; understanding its I/O pattern shows why scheduling matterssstable-and-compaction/sstable.py:CompactionManager — Implements strategy selection (size-tiered vs. leveled) but not scheduling; compare how Cassandra layers scheduling on top of strategyrocksdb-rate-limiter — RocksDB's rate_limiter is the canonical example of throttling compaction I/O to protect foreground latencycassandra-compaction-windows — Cassandra's TimeWindowCompactionStrategy and compactionthroughputmbpersec show how time-awareness is layered into a real LSM systemlog-structured-merge-tree/test_lsm.py — Tests verify compaction correctness but never test behavior under concurrent load, which is where scheduling becomes observablelsm-compaction-is-synchronous — LSMTree.compact() in lsm.py:319 runs synchronously in the caller's thread, blocking all reads and writes until the merge completescompaction-trigger-is-count-only — Both implementations trigger compaction solely on SSTable count thresholds (compactionthreshold at lsm.py:208, minthreshold/l0compactiontrigger in sstable.py), with no load, time, or I/O awarenessno-compaction-rate-limiting — Neither implementation throttles compaction I/O throughput, meaning a large merge will saturate disk bandwidth without regard to concurrent foreground operationscompaction-strategy-vs-scheduling-decoupled — The sstable.py CompactionManager separates *which* SSTables to merge (strategy) from the caller's decision of *when* to merge, but neither the manager nor any caller implements scheduling logic