Topic: How deletes use TOMBSTONE = b"_BITCASKTOMBSTONE__" as a sentinel value within the same record format, and the implications for key-size/value-size calculations

Date: 2026-05-29

Time: 12:29

How Deletes Work via Tombstone Sentinel Values

The two Bitcask implementations in this repo take fundamentally different approaches to representing deletes, and the difference has real implications for correctness.

The Sentinel Approach (log-structured-hash-table/bitcask.py)

This implementation defines an explicit sentinel constant at line 9:


TOMBSTONE = b"__BITCASK_TOMBSTONE__"

When delete() is called (around line 193), it writes a normal record where the value bytes happen to be the sentinel:


def delete(self, key: str) -> bool:
    ...
    self._write_record(key, TOMBSTONE)
    self._index.pop(key, None)
    return existed

The critical thing to understand is what writerecord (line 152) does with this:


header = struct.pack(HEADER_FMT, crc, len(key_bytes), len(value))

The valuesize field in the header is set to len(TOMBSTONE) = 20 bytes. The record on disk is structurally identical to a put("mykey", b"BITCASKTOMBSTONE__"). There is no flag, no special header bit — the tombstone is indistinguishable from a real value at the binary format level.

During recovery in scansegment (around line 93), tombstone detection happens by comparing the value bytes:


value = payload[key_size:]
if value == TOMBSTONE:
    self._index.pop(key, None)

The Empty-String Approach (hash-index-storage/bitcask.py)

This implementation takes a simpler path. delete() (around line 189) writes an empty value:


def delete(self, key: str) -> None:
    self._write_record(key, "")
    self.keydir.pop(key, None)

Here, valuesize in the header is 0. During recovery in scandatafile (around line 123), tombstone detection checks the size field directly:


if val_size == 0:
    self.keydir.pop(key, None)

And get() has a corresponding guard:


if value == "":
    return None

Implications for Key-Size / Value-Size Calculations

| Aspect | log-structured-hash-table | hash-index-storage |

|---|---|---|

| value_size in header | 20 (sentinel length) | 0 |

| Total record size | HEADERSIZE + keylen + 20 | HEADERSIZE + keylen + 0 |

| Detection method | Byte comparison against sentinel | Integer comparison (val_size == 0) |

| CRC coverage | Covers key + tombstone bytes | Covers key only (no value bytes) |

| Forbidden values | b"_BITCASKTOMBSTONE__" cannot be stored | Empty string "" cannot be stored |

Three concrete consequences:

1. Disk overhead: Every tombstone in log-structured-hash-table costs 20 extra bytes on disk. In a workload with heavy deletes, this adds up before compaction runs.

2. Collision risk: If a caller does store.put("x", b"_BITCASKTOMBSTONE__"), the next recovery will treat that key as deleted. The empty-string approach has an analogous but smaller risk — empty values are arguably a more common legitimate value than a 20-byte magic string.

3. Hint file blind spot: loadhint_file in log-structured-hash-table (line 107) only stores (key, offset) pairs — it does not record whether a record is a tombstone. So hint files cannot represent deletes; they'll re-add tombstoned keys to the index. The hash-index-storage version has the same limitation.

Beliefs