Date: 2026-05-29
Time: 06:47
createhintfiles — Bitcask Hint File GeneratorHint files are a startup optimization. Normally, recovering the in-memory index requires scanning every record in every segment file — reading headers, keys, values, and CRCs. A hint file is a compact summary that stores only (keysize, offset, key) tuples, skipping the values entirely. On recovery, loadhintfile can rebuild the index from hint files in a fraction of the time, since it never touches the (potentially large) values.
createhintfiles generates these hint files for all frozen (non-active) segments. It's a batch operation — typically called after compaction or as a maintenance step.
self._index must be current and accurate. The method uses the index to determine which records are still live, so a stale index produces incorrect hint files.self.activepath) must be set..hint file alongside its .dat file containing entries only for keys whose canonical location (per the index) is that specific segment at that specific offset.None — operates entirely on instance state (self.index, self.active_path, and the segment files on disk).
None. This is a side-effect-only method.
1. Enumerate segments via findexistingsegments(), which lists all segmentNNNNNN.dat files sorted by ID.
2. Skip the active segment — it's still being written to, so a hint file would be immediately stale.
3. For each frozen segment, open a new hint file (overwriting any existing one) and scan the segment record by record:
crc32, keysize, valuesize).keybytes + valuebytes).self._index[key] points to *this exact segment and offset*. This is the critical filter — if a key was overwritten in a later segment, the old entry is stale and excluded.4. Each hint entry is written as: struct.pack("!II", key_size, offset) followed by the raw key bytes.
The index check at step 3 is what makes this correct after compaction or repeated writes. Without it, the hint file would contain stale entries that point to values that are no longer the canonical version.
.hint files for every frozen segment. The hint file path is derived by replacing .dat with .hint in the segment filename.scansegment, this method does not verify the CRC of records it reads. It trusts that the segment data is intact.There is essentially none. The method will raise:
FileNotFoundError if a segment file has been deleted between enumeration and opening.struct.error if a file is corrupted in a way that produces a valid-length but unparseable header.UnicodeDecodeError if a key contains invalid UTF-8.Partial reads (short header or payload) are handled gracefully by breaking out of the scan loop — this handles truncated segments from crashes.
Typically called:
compact() to generate hint files for the newly compacted segment.Note that compact() itself does not call createhintfiles, so the caller must do it explicitly. During recovery, _recover checks for hint files and uses them when present, falling back to full segment scans otherwise.
struct for binary encoding/decoding of headers and hint entries.findexisting_segments() for segment enumeration.hintpath() for deriving the hint file path (.dat → .hint replacement).HEADERFMT, HEADERSIZE, HINTENTRYFMT module-level constants.TOMBSTONE sentinel value for identifying deleted records.get().loadhint_file handles by stopping at the first short read — but this means some valid entries may be lost, requiring a full segment scan on next recovery for segments without complete hint files.hint-files-skip-tombstones — Hint files never contain entries for tombstoned (deleted) keys; tombstones are filtered during generation and absent from the hint formathint-files-only-canonical-entries — A hint entry is written only when the in-memory index confirms that the key's canonical location is the current segment and offset, preventing stale entrieshint-generation-skips-active-segment — createhintfiles never generates a hint file for the active segment, only for frozen (immutable) segmentshint-generation-no-crc-validation — Unlike scansegment, createhintfiles does not verify CRC checksums on the records it reads, meaning corrupted data can be indexed via hint fileshint-file-write-not-atomic — Hint files are written directly (not via temp-file-and-rename), so a crash during generation can leave a partial hint file that causes incomplete index recovery