Original source: log-structured-hash-table-test_bitcask

File: log-structured-hash-table/test_bitcask.py

Date: 2026-05-28

Time: 18:55

I'll work from the test file content you provided, which reveals enough about the implementation to explain thoroughly.

Topics to Explore

[file] log-structured-hash-table/bitcask.py — The implementation: how the append-only log, in-memory hash index, CRC framing, and compaction actually work
[function] log-structured-hash-table/bitcask.py:compact — The compaction algorithm that rewrites live entries into new segments and updates the index atomically
[file] log-structured-hash-table/testertestbitcask.py — The meta-test layer: how this repo validates that the test suite itself is correct and comprehensive
[file] hash-index-storage/bitcask.py — The sibling hash-index-storage implementation — compare the two Bitcask variants to understand what design tradeoffs each makes
[general] bitcask-paper — The original Bitcask paper from Riak describes the hint file format, merge process, and keydir rebuild strategy that this implementation models

bitcask-tombstone-semantics — BitcaskStore.delete() writes a tombstone record; get() returns None for tombstoned keys, and compact() removes both the original entry and the tombstone
bitcask-crc-per-record — Every record has a CRC32 in its header (HEADER_FMT includes the checksum field), and get() raises CorruptionError on mismatch rather than returning corrupt data
bitcask-partial-write-safe — On startup, the store skips incomplete records at segment tail (header present but payload truncated) without raising errors or losing previously committed data
bitcask-auto-compact-threshold — When the number of segments exceeds autocompactthreshold, compaction is triggered automatically during put() operations, reducing segment count
bitcask-index-is-memory-only — The in-memory index dict maps keys to (segmentpath, offset) tuples and is rebuilt from disk on startup; hint files accelerate this rebuild by avoiding full segment scans