Date: 2026-05-28
Time: 18:55
I'll work from the test file content you provided, which reveals enough about the implementation to explain thoroughly.
log-structured-hash-table/bitcask.py — The implementation: how the append-only log, in-memory hash index, CRC framing, and compaction actually worklog-structured-hash-table/bitcask.py:compact — The compaction algorithm that rewrites live entries into new segments and updates the index atomicallylog-structured-hash-table/testertestbitcask.py — The meta-test layer: how this repo validates that the test suite itself is correct and comprehensivehash-index-storage/bitcask.py — The sibling hash-index-storage implementation — compare the two Bitcask variants to understand what design tradeoffs each makesbitcask-paper — The original Bitcask paper from Riak describes the hint file format, merge process, and keydir rebuild strategy that this implementation modelsbitcask-tombstone-semantics — BitcaskStore.delete() writes a tombstone record; get() returns None for tombstoned keys, and compact() removes both the original entry and the tombstonebitcask-crc-per-record — Every record has a CRC32 in its header (HEADER_FMT includes the checksum field), and get() raises CorruptionError on mismatch rather than returning corrupt databitcask-partial-write-safe — On startup, the store skips incomplete records at segment tail (header present but payload truncated) without raising errors or losing previously committed databitcask-auto-compact-threshold — When the number of segments exceeds autocompactthreshold, compaction is triggered automatically during put() operations, reducing segment countbitcask-index-is-memory-only — The in-memory index dict maps keys to (segmentpath, offset) tuples and is rebuilt from disk on startup; hint files accelerate this rebuild by avoiding full segment scans