Date: 2026-05-29
Time: 08:37
BitcaskStore.get — Point lookup by keyget is the read path of the Bitcask storage engine. It retrieves a value for a given key by consulting the in-memory hash index to find the record's physical location (segment file + byte offset), then performing a single disk seek to read and validate the record. This is the core design principle of Bitcask: O(1) index lookup in memory, one disk read per query.
key must be a string. The store must not be closed (no guard enforced — see assumptions below).None if the key was never written or was deleted. Raises CorruptionError if the record on disk is truncated or fails its CRC check.self._index or writes to disk.| Parameter | Type | Description |
|-----------|------|-------------|
| key | str | The lookup key. Must match exactly (no prefix/range support). No length validation — any string the caller passes will be checked against the index. |
Optional[bytes] — the raw value bytes on success, None on miss.None (key absent) and CorruptionError (data damaged). There is no distinction between "never written" and "deleted" — both return None, because delete removes the key from the index.1. Index lookup — Check self._index for key. If absent, short-circuit with None. This is a dict lookup: O(1) average case.
2. Locate on disk — Unpack (segpath, offset) from the index entry. segpath is the absolute path to the segment file; offset is the byte position where this record's header begins.
3. Open a fresh file handle — Opens segpath in binary-read mode via a with block. This is deliberate: the class also maintains a file_handles cache elsewhere, but get avoids it. The comment says "to avoid position conflicts" — because a cached handle's file position would be shared across calls, causing reads to land at wrong offsets under interleaved access.
4. Read the header — Seeks to offset, reads HEADERSIZE (12 bytes: three unsigned 32-bit ints in network byte order). Unpacks into (crc, keysize, value_size).
5. Read the payload — Reads keysize + valuesize bytes immediately following the header. The payload is keybytes || valuebytes concatenated.
6. CRC validation — Computes zlib.crc32 over the payload and masks to 32 bits. Compares against the stored crc. This detects bit-rot, partial writes, and file corruption.
7. Extract value — Returns payload[key_size:], slicing off the key prefix to yield only the value bytes.
Minimal. Opens and closes a file descriptor on each call (the with block). No mutations to self._index, no writes to disk, no network I/O. The file handle is not cached — each get pays the open/close cost but avoids stale-position bugs.
| Condition | Behavior |
|-----------|----------|
| Key not in index | Returns None (normal miss) |
| Header read returns fewer than 12 bytes | Raises CorruptionError |
| Payload read returns fewer than expected bytes | Raises CorruptionError |
| CRC mismatch | Raises CorruptionError with expected vs actual CRC |
| Segment file missing (deleted or moved) | Unhandled FileNotFoundError propagates to caller |
| Key was deleted | Returns None — delete() removes the index entry, so it never reaches disk |
Note that get does not check whether the value is a tombstone (TOMBSTONE). It doesn't need to: delete() removes the key from the index, so a tombstone record is never reachable through get. The tombstone sentinel only matters during recovery (scansegment) and compaction.
store = BitcaskStore("/tmp/mydb")
store.put("user:1", b'{"name": "Alice"}')
value = store.get("user:1") # b'{"name": "Alice"}'
value = store.get("missing") # None
Callers should:
None for missing keysCorruptionError if they want graceful degradation instead of a crashget concurrently is safe (fresh handles), but concurrent get + put with the same key could read stale data if the index update and file write interleavestruct — binary (de)serialization of the fixed-size headerzlib.crc32 — checksum computation for integrity verificationself.index — the in-memory hash map, populated by recover() at startup and maintained by put()/delete()get trusts that the file at seg_path still contains the record written at offset1. The segment file still exists at seg_path — compaction deletes old segments and updates the index, but if the process crashes between deletion and index update, get will raise FileNotFoundError.
2. The store hasn't been close()d — no guard prevents calling get after close().
3. offset is a valid position — the index is trusted implicitly; a corrupted index would cause reads at arbitrary offsets.
4. Keys are valid UTF-8 — get never decodes the key (it only slices past it), but the write path encodes with UTF-8, so the stored key length in bytes may differ from len(key) in characters.
5. Values are never equal to TOMBSTONE — if a caller stores the literal bytes b"_BITCASKTOMBSTONE__" as a value, recovery will treat it as a deletion.