Function: put in log-structured-hash-table/bitcask.py

Date: 2026-05-29

Time: 06:45

BitcaskStore.put

Purpose

put is the primary write path for this Bitcask-style key-value store. It appends a key-value record to the active segment file and updates the in-memory hash index so subsequent reads can locate the value by seeking directly to its byte offset. This is the core operation that makes Bitcask an *append-only, log-structured* store — writes never modify existing data on disk, they only append.

Contract

Preconditions:

Postconditions:

Invariants maintained:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| key | str | The lookup key. Encoded to UTF-8 for on-disk storage. No length limit is enforced, but very large keys inflate every record's header payload. |

| value | bytes | The raw value payload. Arbitrary bytes. Callers must not pass TOMBSTONE — doing so would corrupt the logical state (the index would map the key to a tombstone record that get would return as data). |

Return Value

Returns int — the byte offset within the active segment where the record header begins. This is the same offset stored in the index. Callers don't typically need this; it's useful for diagnostics or for building external secondary indexes.

Algorithm

1. Rotation check — Query the active file's current write position (tell()). If it's at or past maxsegmentsize, close the current segment and open a new one via rotatesegment(). This ensures no single segment grows unboundedly.

2. Write the record — Call writerecord(key, value), which:

3. Invalidate stale read handle — Removes any cached read file handle for the active segment path from filehandles. After appending, a previously-opened read handle would have a stale file position. (Note: get() actually opens a fresh handle each time via with open(...), so this is defensive cleanup — it prevents getread_handle from returning a handle whose cursor is in the wrong place.)

4. Update the index — Sets self.index[key] = (activepath, offset). If the key already existed, this silently overwrites the old entry. The old record remains on disk as dead data until compaction reclaims it.

5. Auto-compact check — Counts frozen (non-active) segments. If the count meets or exceeds autocompactthreshold, triggers compact() to merge all frozen segments, removing stale records and tombstones.

6. Return the offset.

Side Effects

Error Handling

No exceptions are explicitly caught. Possible failures:

Usage Patterns


store = BitcaskStore("/tmp/mydb", max_segment_size=4096)
store.put("user:1", b'{"name": "Alice"}')
store.put("user:2", b'{"name": "Bob"}')
store.put("user:1", b'{"name": "Alice Updated"}')  # overwrites previous

Callers must ensure the store is open (not close()'d). There is no locking — concurrent put calls from multiple threads will corrupt both the file and the index. The caller is responsible for external synchronization if needed.

Dependencies

Beliefs