{"results":[{"id":"binary-formats-rigid-across-entire-storage-stack","text":"The entire storage stack uses rigid binary formats that preclude both forward evolution and post-corruption recovery: WAL records are contiguously packed with no block alignment or version negotiation preventing resync after mid-file corruption, and SSTables lack per-entry checksums and efficient skip structures — neither layer can be upgraded in place or self-repaired after partial damage.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"bloom-compaction-rebuild-not-patch","text":"For SSTable compaction, rebuilding a fresh `BloomFilter` during the merge write pass is simpler and more correct than incrementally patching a `CountingBloomFilter` with `remove()`; compaction already pays O(N) I/O so filter construction adds only constant-factor overhead","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"bloom-filter-not-integrated","text":"Neither LSM tree implementation (`lsm.py` or `sstable.py`) references or uses the bloom filter module; they exist as independent DDIA concept demonstrations with zero cross-module imports","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"bloom-serialization-footer-ready","text":"`BloomFilter.to_bytes` packs a 12-byte header (`m`, `k`, `count` as three little-endian uint32s) followed by the raw bit array, matching the pattern used by SSTable footers for sparse indices","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"both-storage-paradigms-hit-scalability-walls","text":"Both storage paradigms in the reference implementations exhibit fundamental scalability constraints: the hash index requires all keys in RAM (making dataset size directly bound by available memory with no spill-to-disk fallback), while the LSM tree scans every SSTable on negative lookups because the correctly-implemented Bloom filter module is never wired into the read path.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compact-deletes-old-sstable-files","text":"`compact()` removes old SSTable files from disk after merging, meaning any concurrent reader holding a reference to a deleted SSTable reads stale data (on Unix) or crashes (on Windows)","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compact-newest-wins-by-seq","text":"During compaction, the entry with the highest SSTable sequence number wins for each key; sequence numbers are per-SSTable (all entries in one SSTable share the same seq), not per-entry","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compact-no-concurrency-safety","text":"`compact()` mutates `self._sstables` without locking; concurrent `_flush` or `get` calls during compaction can produce incorrect state or lose newly flushed SSTables","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compact-purges-tombstones","text":"Tombstones (`b\"\"`) are permanently removed during compaction and never written to the output SSTable; deleted keys disappear entirely after merge","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-buffers-all-entries","text":"`lsm.py:compact` collects all merged entries into an in-memory list before writing the output SSTable, requiring O(n) memory proportional to total data rather than O(k) proportional to number of input SSTables","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-crash-can-resurrect-deleted-keys","text":"A crash during LSM compaction after tombstones are stripped from the merged output but before old SSTables are deleted can resurrect previously-deleted keys, because the tombstone that suppressed them no longer exists in any surviving SSTable.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-deletes-before-reader-release","text":"`compact()` deletes old SSTable files immediately after replacing `self._sstables`, with no mechanism to defer deletion until active readers holding references to the old list finish their iterations","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-is-explicit-not-background","text":"Both the LSM and SSTable modules trigger compaction via explicit synchronous method calls (`compact()` / `run_compaction()`), not background threads — removing the write-amplification pressure that motivates least-overlap selection in production systems.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-manager-never-deletes-old-files","text":"`CompactionManager` removes compacted SSTables from the in-memory `_sstables` list but never deletes their underlying files on disk; cleanup is the caller's responsibility.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-manager-never-removes-tombstones","text":"The `CompactionManager` in `sstable-and-compaction/sstable.py` always calls `merge_sstables` with the default `remove_tombstones=False`, making it conservatively correct but causing unbounded tombstone accumulation","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-not-atomic","text":"`LSMTree.compact()` performs a multi-step file swap (write new SSTable, update in-memory list, delete old files) with no mechanism to make the transition atomic across crashes","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-strategy-vs-scheduling-decoupled","text":"The `sstable.py` `CompactionManager` separates *which* SSTables to merge (size-tiered vs. leveled strategy) from *when* to merge, but neither the manager nor any caller implements scheduling, rate-limiting, or load-aware deferral logic.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-threshold-controls-overlap-window","text":"The `compaction_threshold` parameter (`lsm.py:204`) directly controls how many overlapping SSTables can accumulate before compaction, setting the worst-case missing-key probe count to `threshold - 1`","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-triggered-by-sstable-count","text":"LSM compaction runs automatically when `len(self._sstables) >= self._compaction_threshold` (default 4), triggered at the end of `_flush` after the new SSTable is registered","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"counting-bloom-remove-needs-drop-tracking","text":"Using `CountingBloomFilter.remove()` during compaction would require `merge_sstables` to report which keys were discarded, which the current implementation does not do — duplicates and tombstones are silently skipped in the merge loop","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null}],"count":142,"limit":20,"offset":0}