{"results":[{"id":"anti-entropy-detects-but-cannot-fully-resolve-divergence","text":"Anti-entropy can precisely locate divergent key ranges via Merkle tree diffs but cannot fully reconcile them: tombstone semantics differ at every layer (empty-bytes sentinel in LSM, preserved-by-default in merge, replication-convergence-dependent in distributed), preventing consistent cross-replica resolution of deleted keys.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"bloom-filter-not-integrated","text":"Neither LSM tree implementation (`lsm.py` or `sstable.py`) references or uses the bloom filter module; they exist as independent DDIA concept demonstrations with zero cross-module imports","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"both-storage-paradigms-hit-scalability-walls","text":"Both storage paradigms in the reference implementations exhibit fundamental scalability constraints: the hash index requires all keys in RAM (making dataset size directly bound by available memory with no spill-to-disk fallback), while the LSM tree scans every SSTable on negative lookups because the correctly-implemented Bloom filter module is never wired into the read path.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-buffers-all-entries","text":"`lsm.py:compact` collects all merged entries into an in-memory list before writing the output SSTable, requiring O(n) memory proportional to total data rather than O(k) proportional to number of input SSTables","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-crash-can-resurrect-deleted-keys","text":"A crash during LSM compaction after tombstones are stripped from the merged output but before old SSTables are deleted can resurrect previously-deleted keys, because the tombstone that suppressed them no longer exists in any surviving SSTable.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-is-explicit-not-background","text":"Both the LSM and SSTable modules trigger compaction via explicit synchronous method calls (`compact()` / `run_compaction()`), not background threads — removing the write-amplification pressure that motivates least-overlap selection in production systems.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-not-atomic","text":"`LSMTree.compact()` performs a multi-step file swap (write new SSTable, update in-memory list, delete old files) with no mechanism to make the transition atomic across crashes","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-threshold-controls-overlap-window","text":"The `compaction_threshold` parameter (`lsm.py:204`) directly controls how many overlapping SSTables can accumulate before compaction, setting the worst-case missing-key probe count to `threshold - 1`","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"compaction-triggered-by-sstable-count","text":"LSM compaction runs automatically when `len(self._sstables) >= self._compaction_threshold` (default 4), triggered at the end of `_flush` after the new SSTable is registered","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"crash-failure-paths-systematically-untested","text":"Crash and failure recovery paths are systematically excluded from the test suite: the WAL has no tests for truncated records or CRC mismatches, LSM crash testing covers only WAL replay and ignores compaction crashes entirely, and SSI write-skew tests exist only in standalone tester files outside the default pytest runner — the most critical correctness scenarios have the least test coverage.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"ddia-pure-python-stdlib","text":"All implementations are Python; the storage engine modules use only stdlib except for `sortedcontainers` in the LSM tree — no frameworks or production infrastructure.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"hash-index-read-is-single-seek","text":"A Bitcask `get()` does one dict lookup plus one positioned disk read (O(1)), while an LSM-tree `get()` may search the memtable then multiple SSTables from newest to oldest (O(log N) per level)","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-and-sstable-have-no-checksums","text":"Neither `log-structured-merge-tree/lsm.py` nor `sstable-and-compaction/sstable.py` compute or verify any checksums; a single bit-flip in a length-prefix field causes cascading misframing of all subsequent records","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compact-no-atomic-rename","text":"The LSM tree's `compact()` method does not use `os.rename` or `os.replace`; grep for atomic rename operations returns zero matches in the LSM module, meaning the compaction output is written directly to its final path with no atomic swap.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compact-removes-tombstones-safely","text":"The LSM Tree's `compact()` method removes tombstones because it performs a full merge of all SSTables, guaranteeing no surviving SSTable can contain a superseded live value","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compaction-blocks-callers","text":"`LSMTree.compact()` runs synchronously in the caller's thread, meaning a `put()` or `delete()` that triggers the SSTable count threshold will block until the full k-way merge completes.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compaction-deletes-without-fsync-barrier","text":"The LSM tree's `compact()` calls `os.remove(sst.path)` without an explicit `fsync` on the new SSTable or its parent directory beforehand, meaning the new file's data may not be durable when the old file is unlinked","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compaction-duplicates-safe","text":"A crash during LSM compaction produces duplicate entries (old and merged SSTables both present) rather than data loss, because newer SSTables take read precedence.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compaction-is-full-merge","text":"`compact()` merges all SSTables into a single new SSTable (size-tiered, single-level), not incremental or leveled — simpler but with higher space amplification.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"lsm-compaction-last-writer-wins","text":"Both LSM implementations resolve key conflicts during compaction by keeping only the newest value; older values are unconditionally discarded with no user-defined merge logic","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null}],"count":83,"limit":20,"offset":0}