Topic: The empty-string tombstone in hash-index-storage means you literally cannot store an empty value; compare with the dedicated sentinel in log-structured-hash-table and consider what the paper recommends (a flag in the record header)

Date: 2026-05-29

Time: 08:54

The Tombstone Design Flaw in hash-index-storage

How hash-index-storage encodes deletion

In hash-index-storage/bitcask.py, deletion works by writing a record with an empty string as the value:


# bitcask.py:181
def delete(self, key: str) -> None:
    """Delete a key by appending a tombstone record."""
    self._maybe_rotate()
    self._write_record(key, "")
    self.keydir.pop(key, None)

The get method then treats any empty value as "deleted":


# bitcask.py:173-177
def get(self, key: str) -> Optional[str]:
    ...
    if value == "":
        return None
    return value

And during index rebuild, the same convention is used — val_size == 0 means tombstone:


# bitcask.py:137-139
if val_size == 0:
    # Tombstone - remove from keydir
    self.keydir.pop(key, None)

This creates a semantic collision: the empty string "" serves double duty as both "this key was deleted" and "the value is an empty string." A caller who does store.put("config", "") followed by store.get("config") gets None back — the store silently treats their valid write as a deletion. There's no way to distinguish intent.

How log-structured-hash-table avoids this

The second implementation (log-structured-hash-table/bitcask.py) uses a dedicated byte sentinel:


# bitcask.py:9
TOMBSTONE = b"__BITCASK_TOMBSTONE__"

Deletion writes this specific byte sequence as the value:


# bitcask.py:188-194
def delete(self, key: str) -> bool:
    """Delete a key by writing a tombstone. Returns True if key existed."""
    ...
    self._write_record(key, TOMBSTONE)
    self._index.pop(key, None)
    return existed

And during recovery, only this exact sentinel is treated as a tombstone:


# bitcask.py:102-105
if value == TOMBSTONE:
    self._index.pop(key, None)
else:
    self._index[key] = (seg_path, offset)

This is better — empty values work fine. But it shifts the problem rather than eliminating it: now a user who stores the literal bytes b"_BITCASKTOMBSTONE__" as a value will have that key silently deleted on recovery. It's unlikely, but the failure mode is silent data loss, which is the worst kind.

What the paper recommends

DDIA (Chapter 3) describes the Bitcask approach and notes that deletions should use a special deletion record — a tombstone. The clean way to implement this is a flag in the record header. Instead of overloading the value field, you add a single byte (or bit) to the header format that says "this record is a delete." The header already carries metadata (timestamp, keysize, valuesize); a flags field fits naturally:


HEADER_FORMAT = "<dIIB"  # timestamp, key_size, value_size, flags
# flags & 0x01 → tombstone

With a header flag:

The hash-index-storage header format (<dII — timestamp, keysize, valuesize, 16 bytes total at line 10) would only need one additional byte to support this properly.

Summary of the three approaches

| Approach | Implementation | Flaw |

|----------|---------------|------|

| Empty string as tombstone | hash-index-storage/bitcask.py | Cannot store "" as a value |

| Magic byte sentinel | log-structured-hash-table/bitcask.py | Cannot store the sentinel bytes as a value |

| Header flag | (DDIA recommendation) | No value-space collision; cleanest separation of data from metadata |

Topics to Explore

Beliefs