Date: 2026-05-28
Time: 18:58
writehint_filewritehint_file serializes a set of index entries to a companion .hint file alongside a .data file. Hint files exist as a startup optimization: instead of scanning every record in a data file to rebuild the in-memory key directory (keydir), the engine can read the hint file, which contains only the index metadata (no values), making recovery dramatically faster — O(keys) small reads instead of O(data-size) full scans.
This is a core Bitcask design concept from the original Riak paper. Hint files are only written during compaction, never during normal writes.
entries contains the final, deduplicated set of live key→location mappings for file_id. The corresponding .data file must already be fully written and flushed — the offsets and sizes in each KeyEntry must be valid positions in that data file.{fileid}.hint exists on disk containing a serialized representation of every entry, readable by loadhintfile.keydir entries without reading the data file.| Parameter | Type | Description |
|-----------|------|-------------|
| fileid | int | Numeric identifier for the data file this hint corresponds to. Used to derive the path {datadir}/{file_id}.hint. |
| entries | list[tuple[str, KeyEntry]] | List of (keystring, indexentry) pairs. Each pair maps a key to its location in the data file. Order doesn't matter semantically but is preserved in the file. An empty list produces an empty hint file. |
Edge cases: If entries is empty, the method creates a valid but zero-length hint file. Keys containing non-ASCII characters will be encoded as UTF-8, and the key length field reflects the byte length, not the character count.
None. The method operates purely through side effects.
For each (key, entry) pair, three writes are appended sequentially:
1. Index header (24 bytes) — struct.pack(HINT_FORMAT, ...) writes four fields in little-endian format:
file_id as uint32 (I) — which data file holds this recordoffset as uint64 (Q) — byte offset within that data filesize as uint32 (I) — total record size (header + key + value)timestamp as float64 (d) — original write timestamp2. Key length (4 bytes) — struct.pack("<I", len(key_bytes)) writes the UTF-8 byte length of the key as a little-endian uint32.
3. Key data (variable) — the raw UTF-8 bytes of the key.
The resulting per-entry layout is:
[file_id:4][offset:8][size:4][timestamp:8] [key_len:4] [key_bytes:variable]
──────────── 24 bytes ────────────────── ─── 4 ───── ─── key_len ────────
{datadir}/{fileid}.hint in binary mode ("wb"). The file is written without explicit fsync — contrast with writerecord, which does fsync when sync_writes is enabled. This is acceptable because hint files are a rebuild optimization, not a durability mechanism; if lost, the engine falls back to scanning the data file.keydir, file_handles, or any other instance state.No explicit error handling. The method will propagate:
OSError / PermissionError if the hint file can't be created or writtenstruct.error if any KeyEntry field overflows its format bounds (e.g., offset exceeding uint64 range, though practically impossible)UnicodeEncodeError if a key contains characters that can't be UTF-8 encoded (impossible for valid Python str)The with block ensures the file handle is closed even on exception, but a partial write will leave a corrupted hint file on disk. Since loadhint_file doesn't validate checksums, a truncated hint file would cause an struct.error or IndexError on the next startup.
Called exclusively from compact(), in two places:
1. Mid-compaction rotation — when a merged data file reaches maxfilesize, the hint file for the just-completed segment is written before starting a new segment.
2. End of compaction — after the final merged data file is closed.
Callers are responsible for collecting the (key, KeyEntry) pairs as records are written to the merged file. The entries must reflect the *new* file's offsets, not the original file's.
struct — binary packing via format string HINT_FORMAT = "<IQId" (24 bytes per entry header)self.hintpath(fileid) — resolves to {datadir}/{file_id}.hintKeyEntry dataclass — provides file_id, offset, size, timestamp fieldsThe format must stay in sync with loadhintfile, which reads the same binary layout. Any change to HINTFORMAT or the key-length encoding breaks backward compatibility.