Function: writehint_file in hash-index-storage/bitcask.py

Date: 2026-05-28

Time: 18:58

writehint_file

Purpose

writehint_file serializes a set of index entries to a companion .hint file alongside a .data file. Hint files exist as a startup optimization: instead of scanning every record in a data file to rebuild the in-memory key directory (keydir), the engine can read the hint file, which contains only the index metadata (no values), making recovery dramatically faster — O(keys) small reads instead of O(data-size) full scans.

This is a core Bitcask design concept from the original Riak paper. Hint files are only written during compaction, never during normal writes.

Contract

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| fileid | int | Numeric identifier for the data file this hint corresponds to. Used to derive the path {datadir}/{file_id}.hint. |

| entries | list[tuple[str, KeyEntry]] | List of (keystring, indexentry) pairs. Each pair maps a key to its location in the data file. Order doesn't matter semantically but is preserved in the file. An empty list produces an empty hint file. |

Edge cases: If entries is empty, the method creates a valid but zero-length hint file. Keys containing non-ASCII characters will be encoded as UTF-8, and the key length field reflects the byte length, not the character count.

Return Value

None. The method operates purely through side effects.

Algorithm

For each (key, entry) pair, three writes are appended sequentially:

1. Index header (24 bytes) — struct.pack(HINT_FORMAT, ...) writes four fields in little-endian format:

2. Key length (4 bytes) — struct.pack("<I", len(key_bytes)) writes the UTF-8 byte length of the key as a little-endian uint32.

3. Key data (variable) — the raw UTF-8 bytes of the key.

The resulting per-entry layout is:


[file_id:4][offset:8][size:4][timestamp:8] [key_len:4] [key_bytes:variable]
 ──────────── 24 bytes ────────────────── ─── 4 ───── ─── key_len ────────

Side Effects

Error Handling

No explicit error handling. The method will propagate:

The with block ensures the file handle is closed even on exception, but a partial write will leave a corrupted hint file on disk. Since loadhint_file doesn't validate checksums, a truncated hint file would cause an struct.error or IndexError on the next startup.

Usage Patterns

Called exclusively from compact(), in two places:

1. Mid-compaction rotation — when a merged data file reaches maxfilesize, the hint file for the just-completed segment is written before starting a new segment.

2. End of compaction — after the final merged data file is closed.

Callers are responsible for collecting the (key, KeyEntry) pairs as records are written to the merged file. The entries must reflect the *new* file's offsets, not the original file's.

Dependencies

The format must stay in sync with loadhintfile, which reads the same binary layout. Any change to HINTFORMAT or the key-length encoding breaks backward compatibility.