Date: 2026-05-29
Time: 13:09
I can't read the target repo directly, so I'll work from the observation data. The compaction methods are cut off at line 200 in both files — that's a significant gap. Here's what the visible code reveals, plus where the observations fall short.
hash-index-storage/bitcask.py:compact — The compaction body (line 200+) would reveal exactly how old segments are deleted and whether file handles are cleaned up from the file_handles dictlog-structured-hash-table/bitcask.py:compact — Compare the second implementation's compaction strategy, especially whether it writes hint files and how it updates _index entries pointing to merged segmentserlang-bitcask-refcount-model — Study the Erlang Bitcask source (bitcaskfileops.erl) to see how #filestate.refcount manages deferred deletion and compare with the Python simplificationlog-structured-hash-table/bitcask.py:get — Uses with open(...) per read instead of cached handles — an implicit concurrency-safety pattern worth contrasting with hash-index's shared-handle approachsegment-generation-and-keydir-swap — How production Bitcask does atomic keydir replacement during merge (ETS table swap) vs. the in-place dict mutation in both Python versionshash-index-bitcask-shared-read-handles — hash-index-storage/bitcask.py uses a single cached file handle per segment for all reads via getreader(), making concurrent reads to the same segment unsafe due to shared seek positionlog-structured-bitcask-fresh-handle-per-get — log-structured-hash-table/bitcask.py:get() opens a fresh file handle with with open(...) on every read, bypassing the filehandles cache entirelypython-bitcask-no-refcount-or-locking — Neither Python Bitcask implementation has reference counting, reader registration, or locking — compaction can delete segment files while a concurrent reader holds a stale index entry pointing to thempython-bitcask-single-threaded-scope — Both Python Bitcask implementations are designed as single-threaded teaching engines; the absence of Erlang Bitcask's concurrency machinery (refcounted FDs, atomic keydir swap) is a deliberate scope boundary, not a bug