Date: 2026-05-29
Time: 06:55
I have enough from the observations to do a thorough comparison. Let me work through the three pillars of the Bitcask paper against these two implementations.
hash-index-storage/bitcask.py:compact — The full merge implementation (lines 194–300+) is the most paper-faithful part of the codebase; trace how it decides which records are stale and how it atomically switches from old to new fileslog-structured-hash-table/bitcask.py:compact — Compare this implementation's compaction (beyond line 200) against the hash-index-storage version, particularly how it handles the index update and old segment cleanup without timestamps in the indextombstone-semantics — The empty-string tombstone in hash-index-storage means you literally cannot store an empty value; compare with the dedicated sentinel in log-structured-hash-table and consider what the paper recommends (a flag in the record header)log-structured-hash-table/test_bitcask.py — The partial-write recovery and CRC corruption tests (lines ~147–180) exercise crash scenarios that hash-index-storage has no coverage forconcurrent-merge-safety — Neither implementation handles concurrent readers during merge; the paper describes an atomic switchover where old files are deleted only after the keydir is updated — trace whether the compact() methods maintain this invariant or have a window where reads could failhash-index-keydir-has-four-fields — The hash-index-storage KeyEntry stores (file_id, offset, size, timestamp), matching the paper's keydir entry shape, while log-structured-hash-table stores only (filepath, offset)hint-files-automatic-only-in-hash-index — Only hash-index-storage/bitcask.py generates hint files automatically during compaction; log-structured-hash-table requires an explicit createhintfiles() callno-crc-in-hash-index-storage — The hash-index-storage record format (<dII + key + value) contains no checksum; data corruption is undetectable on readempty-string-is-tombstone — In hash-index-storage/bitcask.py, deletes write an empty-string value and get() returns None for empty values (line 181), making it impossible to store a legitimate empty-string valuecompact-skips-active-file — Both implementations exclude the active/current file from compaction, matching the paper's requirement that only immutable files are merged