TOMBSTONE = b"_BITCASKTOMBSTONE__" as a sentinel value within the same record format, and the implications for key-size/value-size calculationsDate: 2026-05-29
Time: 12:29
The two Bitcask implementations in this repo take fundamentally different approaches to representing deletes, and the difference has real implications for correctness.
log-structured-hash-table/bitcask.py)This implementation defines an explicit sentinel constant at line 9:
TOMBSTONE = b"__BITCASK_TOMBSTONE__"
When delete() is called (around line 193), it writes a normal record where the value bytes happen to be the sentinel:
def delete(self, key: str) -> bool:
...
self._write_record(key, TOMBSTONE)
self._index.pop(key, None)
return existed
The critical thing to understand is what writerecord (line 152) does with this:
header = struct.pack(HEADER_FMT, crc, len(key_bytes), len(value))
The valuesize field in the header is set to len(TOMBSTONE) = 20 bytes. The record on disk is structurally identical to a put("mykey", b"BITCASKTOMBSTONE__"). There is no flag, no special header bit — the tombstone is indistinguishable from a real value at the binary format level.
During recovery in scansegment (around line 93), tombstone detection happens by comparing the value bytes:
value = payload[key_size:]
if value == TOMBSTONE:
self._index.pop(key, None)
hash-index-storage/bitcask.py)This implementation takes a simpler path. delete() (around line 189) writes an empty value:
def delete(self, key: str) -> None:
self._write_record(key, "")
self.keydir.pop(key, None)
Here, valuesize in the header is 0. During recovery in scandatafile (around line 123), tombstone detection checks the size field directly:
if val_size == 0:
self.keydir.pop(key, None)
And get() has a corresponding guard:
if value == "":
return None
| Aspect | log-structured-hash-table | hash-index-storage |
|---|---|---|
| value_size in header | 20 (sentinel length) | 0 |
| Total record size | HEADERSIZE + keylen + 20 | HEADERSIZE + keylen + 0 |
| Detection method | Byte comparison against sentinel | Integer comparison (val_size == 0) |
| CRC coverage | Covers key + tombstone bytes | Covers key only (no value bytes) |
| Forbidden values | b"_BITCASKTOMBSTONE__" cannot be stored | Empty string "" cannot be stored |
Three concrete consequences:
1. Disk overhead: Every tombstone in log-structured-hash-table costs 20 extra bytes on disk. In a workload with heavy deletes, this adds up before compaction runs.
2. Collision risk: If a caller does store.put("x", b"_BITCASKTOMBSTONE__"), the next recovery will treat that key as deleted. The empty-string approach has an analogous but smaller risk — empty values are arguably a more common legitimate value than a 20-byte magic string.
3. Hint file blind spot: loadhint_file in log-structured-hash-table (line 107) only stores (key, offset) pairs — it does not record whether a record is a tombstone. So hint files cannot represent deletes; they'll re-add tombstoned keys to the index. The hash-index-storage version has the same limitation.
lsht-tombstone-is-real-value — In log-structured-hash-table/bitcask.py, tombstone records have value_size=20 in the header and are structurally identical to normal records; detection requires reading and comparing the full value byteshash-index-tombstone-is-zero-length — In hash-index-storage/bitcask.py, tombstone records have value_size=0 and are detected by checking the size field alone, without reading value byteshint-files-cannot-represent-deletes — Neither implementation's hint file format stores tombstone status, so recovery via hint files alone will incorrectly restore deleted keys to the indextombstone-sentinel-is-a-forbidden-value — Storing b"_BITCASKTOMBSTONE_" as a legitimate value in log-structured-hash-table will cause silent data loss on next recovery, as scan_segment will interpret it as a delete