Date: 2026-05-29
Time: 06:45
BitcaskStore.putput is the primary write path for this Bitcask-style key-value store. It appends a key-value record to the active segment file and updates the in-memory hash index so subsequent reads can locate the value by seeking directly to its byte offset. This is the core operation that makes Bitcask an *append-only, log-structured* store — writes never modify existing data on disk, they only append.
Preconditions:
activefile is open for appending, activepath is set).key must be a valid UTF-8 string (it gets .encode("utf-8") inside writerecord).value must not be TOMBSTONE (b"_BITCASKTOMBSTONE__") — that sentinel is reserved for delete(). This is not enforced by put.Postconditions:
flush()).self.index[key] points to (activesegmentpath, byteoffset) of the new record.maxsegmentsize, a new segment was opened *before* the write.autocompactthreshold, compaction ran.Invariants maintained:
| Parameter | Type | Description |
|-----------|------|-------------|
| key | str | The lookup key. Encoded to UTF-8 for on-disk storage. No length limit is enforced, but very large keys inflate every record's header payload. |
| value | bytes | The raw value payload. Arbitrary bytes. Callers must not pass TOMBSTONE — doing so would corrupt the logical state (the index would map the key to a tombstone record that get would return as data). |
Returns int — the byte offset within the active segment where the record header begins. This is the same offset stored in the index. Callers don't typically need this; it's useful for diagnostics or for building external secondary indexes.
1. Rotation check — Query the active file's current write position (tell()). If it's at or past maxsegmentsize, close the current segment and open a new one via rotatesegment(). This ensures no single segment grows unboundedly.
2. Write the record — Call writerecord(key, value), which:
!III: crc, keylen, valuelen).3. Invalidate stale read handle — Removes any cached read file handle for the active segment path from filehandles. After appending, a previously-opened read handle would have a stale file position. (Note: get() actually opens a fresh handle each time via with open(...), so this is defensive cleanup — it prevents getread_handle from returning a handle whose cursor is in the wrong place.)
4. Update the index — Sets self.index[key] = (activepath, offset). If the key already existed, this silently overwrites the old entry. The old record remains on disk as dead data until compaction reclaims it.
5. Auto-compact check — Counts frozen (non-active) segments. If the count meets or exceeds autocompactthreshold, triggers compact() to merge all frozen segments, removing stale records and tombstones.
6. Return the offset.
put call can have significant latency if it triggers compaction.No exceptions are explicitly caught. Possible failures:
OSError/IOError: If the filesystem is full, permissions are wrong, or the file was closed externally. These propagate uncaught — a failed write leaves the store in an inconsistent state (the record may be partially written but the index may or may not be updated depending on where the exception occurred).UnicodeEncodeError: If key contains characters that can't be encoded to UTF-8 (unlikely with Python str, but possible with surrogates).writerecord and the index update means the record is on disk but not indexed (it will be recovered on next startup via _recover).
store = BitcaskStore("/tmp/mydb", max_segment_size=4096)
store.put("user:1", b'{"name": "Alice"}')
store.put("user:2", b'{"name": "Bob"}')
store.put("user:1", b'{"name": "Alice Updated"}') # overwrites previous
Callers must ensure the store is open (not close()'d). There is no locking — concurrent put calls from multiple threads will corrupt both the file and the index. The caller is responsible for external synchronization if needed.
struct: Binary packing for the record header format (!III).zlib.crc32: Integrity checksum over key+value payload.os: Filesystem operations (directory creation, file size, deletion during compaction).writerecord, rotatesegment, frozensegment_paths, compact — the write path is distributed across these helpers.put-append-only — put never modifies existing on-disk data; overwrites create a new record and update only the in-memory index, leaving the old record as reclaimable garbageput-triggers-compaction — A put call can trigger synchronous compaction (file deletion, rewriting, renaming) if frozen segment count reaches autocompactthreshold, making write latency non-uniformput-no-tombstone-guard — put does not reject TOMBSTONE as a value; passing it creates a record indistinguishable from a delete, which would silently corrupt the key's state on recoveryput-not-thread-safe — There is no locking or synchronization in the write path; concurrent put calls will interleave writes to the active file and corrupt the indexput-crash-recovery-gap — If a crash occurs after writerecord but before the index update, the record is orphaned on disk until _recover replays the segment on next startup