{"results":[{"id":"add-node-same-owner-guard","text":"The `old_owner != node_id` guard at `consistent_hashing.py:38` skips transfer recording when a new vnode's successor is another vnode of the same node, avoiding double-counting already-claimed arcs.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"avro-schema-canonical-equality","text":"`Schema(\"int\")` and `Schema({\"type\": \"int\"})` are equal and hash-equal, enabling consistent set/dict usage regardless of whether the schema was constructed from a string or dict form","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bitcask-compact-not-crash-safe","text":"Hash-index-storage compaction is non-atomic with no error handling: a crash between deleting old files and completing the rewrite can leave the store in an unrecoverable state with no rollback mechanism.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bitcask-crc-validated-on-read-only","text":"In the log-structured hash table, CRC32 validation occurs only during `get()` and `_scan_segment()`; writes compute and store the CRC but never verify it back, so a corrupted write is not caught until read time","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bitcask-header-is-12-bytes","text":"The log-structured hash table's `HEADER_FMT = \"!III\"` produces a fixed 12-byte header (three big-endian uint32s: CRC, key_size, value_size) at `log-structured-hash-table/bitcask.py:10-11`; total record size is always `12 + len(key) + len(value)`","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bitcask-no-range-query-support","text":"Neither Bitcask implementation has any range query, ordered iteration, or prefix scan capability; the hash-based keydir only supports exact-key point lookups","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-and-counting-share-hash-function","text":"BloomFilter and CountingBloomFilter use the identical `_hashes()` double-hashing scheme (MD5-based, lines 9-14), producing the same bit/counter positions for identical parameters — so a BloomFilter and CountingBloomFilter with the same `m` and `k` will agree on membership queries for the same insertions.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-double-hashing","text":"`_hashes()` uses Kirschner-Mitzenmacher double hashing from a single MD5 digest, producing k positions as `(h1 + i*h2) % m` without k independent hash functions","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-filter-deterministic-hashing","text":"The Bloom filter hash function is deterministic (not seeded with randomness), so two filters with identical parameters produce identical bit arrays for identical inputs — enabling equality comparison and reproducible serialization","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-filter-double-hashing","text":"`BloomFilter` generates k bit positions via Kirschner-Mitzenmacker double hashing: one MD5 digest is split into two 64-bit values h1 and h2, then positions are computed as `(h1 + i*h2) % m` rather than using k independent hash functions.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-filter-optimal-sizing-textbook","text":"`BloomFilter` computes `bit_count` as `ceil(-n * ln(p) / ln(2)^2)` and `hash_count` as `round((m/n) * ln(2))`, matching the standard optimal formulas from Bloom filter literature","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-filter-sizing-is-production-ready","text":"The Bloom filter implementation uses textbook-optimal bit array and hash count formulas, making it ready for production integration.","truth_value":"OUT","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-filter-union-requires-identical-params","text":"`BloomFilter.union()` raises `ValueError` if the two filters differ in bit array size (m) or hash count (k), since bitwise OR is only meaningful when bit positions map to the same hash space.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-hashes-no-m-zero-guard","text":"`_hashes` does not guard against `m=0`, which causes an unhandled `ZeroDivisionError` in the `(h1 + i * h2) % m` computation","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-hashes-positions-not-unique","text":"The `k` positions returned by `_hashes` are not guaranteed distinct — duplicate positions can occur when `m` is small relative to `k`, and all callers (add, contains, remove) tolerate this via idempotent bit/counter operations","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"bloom-md5-split-64bit","text":"`_hashes` splits the 128-bit MD5 digest at the midpoint into two 64-bit little-endian unsigned integers `h1` and `h2`, used as base position and step size for double hashing","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"both-storage-paradigms-hit-scalability-walls","text":"Both storage paradigms in the reference implementations exhibit fundamental scalability constraints: the hash index requires all keys in RAM (making dataset size directly bound by available memory with no spill-to-disk fallback), while the LSM tree scans every SSTable on negative lookups because the correctly-implemented Bloom filter module is never wired into the read path.","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"broadcast-hash-join-requires-small-side-in-memory","text":"`BroadcastHashJoin` loads the entire small dataset into a hash table at construction time via `_build_hash_table`, so the small side must fit in a single mapper's memory","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"broadcast-join-loads-small-side-at-construction","text":"`BroadcastHashJoin` receives the small dataset at construction time and builds a hash table; the large dataset is streamed through `.join()`, reflecting the asymmetric API where the small side must be available upfront","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"cbf-remove-can-introduce-false-negatives","text":"Unlike standard Bloom filters which never have false negatives, `CountingBloomFilter.remove()` can cause false negatives when two items share a hash position — removing one decrements the shared counter, potentially making the other appear absent","truth_value":"IN","justification_count":0,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""}],"count":120,"limit":20,"offset":0}