Topic: Read the original Bitcask paper (Sheehy & Smith, 2010) to compare the Erlang-specific details (keydir sharing via ETS tables, lock files, merge triggers) against these Python simplifications

Date: 2026-05-29

Time: 08:35

Bitcask Paper vs. Python Implementations: Erlang-Specific Details

The original Bitcask paper (Sheehy & Smith, 2010) was designed for Riak, an Erlang/OTP system. Several of its core mechanisms exploit Erlang's concurrency primitives in ways that these Python implementations deliberately simplify away. Here's where the two diverge — and what's lost.

2. Lock Files for Writer Exclusion

Paper: Bitcask enforces a single-writer invariant using OS-level lock files. Only one Erlang process can hold the write lock on a Bitcask directory at a time. Other processes can open the same directory for reads (sharing the keydir via ETS), but writes are serialized through the lock holder. The lock file also records the PID of the writer for crash detection — if the process dies, the stale lock can be broken.

Python implementations: Neither implementation has any locking mechanism. The grep for lock|Lock|flock|fcntl shows zero matches in either Bitcask file — all results are from the unrelated fencing-tokens/ module.

What's lost: Without locking, nothing prevents two BitcaskStore instances from simultaneously appending to the same data directory, corrupting each other's data files and producing an inconsistent keydir. The Python implementations are strictly single-process, single-instance tools. This is fine for educational use and matches the DDIA framing, but it means you can't safely embed them in a multi-process server.

---

3. Merge Triggers and Background Merge

Paper: Bitcask's merge (compaction) runs as a separate Erlang process, triggered automatically by configurable policies: fragmentation ratio, dead bytes threshold, file count, or time-based schedules. The merge process reads immutable (frozen) segment files, writes compacted output files with hint files, and atomically swaps keydir entries via ETS — all while the writer continues appending to the active file and readers continue serving from the old segments until the swap completes.

Python implementations: Both have manual compaction, but with very different trigger models:

`python

frozen = self.frozensegment_paths()

if len(frozen) >= self.autocompact_threshold:

self.compact()

`

This is a simplified version of the paper's trigger, using only segment count — not fragmentation ratio or dead bytes.

Critical difference: In both Python implementations, compaction is synchronous and blocking. While compact() runs, no reads or writes can proceed. The paper's Erlang implementation runs merge as a background process with concurrent read access to old files and atomic keydir updates. The Python versions must block the caller during the entire merge operation — copying live records, writing hint files, deleting old segments, and updating the keydir all happen in the same thread of execution.

---

4. Other Erlang-Specific Details Simplified Away

| Paper Feature | hash-index-storage | log-structured-hash-table |

|---|---|---|

| CRC integrity | No checksums | CRC32 on every record (line 143) |

| Tombstone encoding | Empty string "" (line 186) | Sentinel bytes b"_BITCASKTOMBSTONE__" (line 7) |

| Value types | str only | bytes (closer to paper) |

| Hint files | Written during compact (line 161) | Separate createhintfiles() call |

| Iterator protocol | keys() returns list (line 192) | Full _iter_ with Iterator support |

| fsync policy | Configurable sync_writes (line 86) | Always flush(), no fsync |

The CRC difference is notable: the paper mandates CRC checks on every read to detect bitrot and partial writes. hash-index-storage/bitcask.py omits this entirely, while log-structured-hash-table/bitcask.py implements it faithfully with zlib.crc32 and raises CorruptionError on mismatch (tested at test_bitcask.py:121-131).

---

Summary

These implementations capture Bitcask's core insight — an append-only log with an in-memory hash index gives you O(1) reads with high write throughput — while deliberately dropping the production machinery that makes it work in a concurrent, multi-process database server. The ETS keydir, lock files, and background merge are not incidental details; they're what make Bitcask usable as a storage engine inside Riak rather than a single-user CLI tool. Understanding what was removed is as instructive as understanding what was kept.

---

Topics to Explore

Beliefs