Date: 2026-05-29
Time: 06:49
loadhint_file — Fast Index Recovery from Hint Filesloadhintfile reconstructs the in-memory key index (self.keydir) from a precomputed hint file instead of scanning the full data file record-by-record. Hint files are a Bitcask optimization: during compaction, a compact summary of each record's location metadata is written alongside the merged data file. On startup, loading a hint file is O(index entries) with minimal I/O, whereas scandatafile must read and skip over every value payload — potentially gigabytes of data — just to extract the same key-to-location mappings.
{datadir}/{fileid}.hint must exist and be well-formed. The caller (rebuildindex) checks os.path.exists before calling.KeyEntry in self.keydir. If the same key appears multiple times in the hint file, the last entry wins (simple overwrite).self.keydir[key] points to a valid (fileid, offset, size) tuple that can be used by read_record to retrieve the full record from the data file.| Parameter | Type | Description |
|-----------|------|-------------|
| self | BitcaskStore | The store instance whose keydir will be populated |
| fileid | int | Numeric ID of the data file whose hint file to load. Maps to {fileid}.hint on disk. |
None. The method mutates self.keydir as a side effect.
1. Bulk read: Opens the hint file and reads the entire contents into memory as a single bytes object. This is an intentional choice — hint files are small (they contain no value data), so buffering the whole thing avoids repeated syscalls.
2. Iterate fixed-header records: Walks through the byte buffer using a manual position cursor pos:
a. Unpack the fixed header (24 bytes, HINT_FORMAT = "<IQId"):
fid (uint32) — the data file ID containing this recordoffset (uint64) — byte offset within that data filesize (uint32) — total record size in the data file (header + key + value)ts (double) — timestamp when the record was writtenb. Read the variable-length key: Unpacks a 4-byte little-endian uint32 for key_size, then reads that many bytes and decodes as UTF-8.
c. Insert into keydir: Creates a KeyEntry(fid, offset, size, ts) and stores it under the decoded key. If the key already exists (from an earlier file's hint), it is silently overwritten — this is correct because rebuildindex processes files in sorted ID order, so later files contain newer data.
self.keydir: Adds or overwrites entries for every key found in the hint file.with block.scandatafile which uses getreader, this method does not register a file handle in self.filehandles — hint files are read-once at startup and never accessed again.None. The method will raise:
FileNotFoundError if the hint file doesn't exist (caller is responsible for checking).struct.error if the file is truncated or corrupt mid-record.UnicodeDecodeError if a key contains invalid UTF-8 bytes.IndexError / struct.error if key_size extends beyond the buffer.No corruption detection (checksums, magic bytes) is performed. The method trusts that the hint file is well-formed.
Called exclusively by rebuildindex during store initialization:
def _rebuild_index(self, file_ids):
for fid in sorted(file_ids):
if os.path.exists(self._hint_path(fid)):
self._load_hint_file(fid) # fast path
else:
self._scan_data_file(fid) # slow path
The caller obligation is simple: only call this when a .hint file is known to exist. The symmetric writer is writehint_file, called during compaction to produce the hint files that this method later consumes.
struct — binary packing/unpacking with little-endian format stringsHINTFORMAT / HINTHEADER_SIZE — module-level constants defining the fixed portion of each hint record ("<IQId", 24 bytes)KeyEntry dataclass — the value type stored in keydirself.hintpath(fileid) — resolves to {datadir}/{file_id}.hint1. Hint file format includes a key-size field not covered by HINTFORMAT: The 4-byte keysize is packed separately from the fixed header. This means each hint record is actually HINTHEADERSIZE + 4 + len(key_bytes) — a variable-length format with a two-part header. This asymmetry between the format constant and actual layout is easy to miss.
2. No tombstone handling: Unlike scandatafile which checks valsize == 0 to identify tombstones and removes keys from keydir, hint files are only produced during compaction — and compaction already filters out tombstones. So hint files are assumed to contain only live records.
3. fid in the hint record may differ from the file_id parameter: The hint record stores its own fid field (the data file ID where the record lives). In practice after compaction these should match, but the code trusts the embedded fid over the parameter — which is the correct behavior if a hint file ever mapped across data files.
4. UTF-8 keys only: Keys are decoded with .decode("utf-8") with no error handling. Binary keys or other encodings will raise.
hint-files-only-from-compaction — Hint files are only written during compact(); the normal write path (put/delete) never produces them, so un-compacted data files always use the slow scandata_file recovery pathhint-no-tombstones — Hint files contain no tombstone records because compaction filters them out before writing; loadhint_file therefore never removes keys from keydirhint-format-variable-length — Each hint record is HINTHEADERSIZE + 4 + keylength bytes; the HINTFORMAT constant covers only the first 24 bytes, not the full recordhint-no-integrity-check — Hint files have no checksums, magic numbers, or version fields; a corrupt or truncated hint file will cause struct unpacking errors rather than graceful recoveryrebuild-order-matters — rebuildindex processes file IDs in sorted order so that newer records overwrite older ones in keydir; loadhint_file relies on this by unconditionally overwriting existing entries