Date: 2026-05-29
Time: 07:27
writerecord — Append a single key-value record to the active segmentwriterecord is the low-level write primitive for the Bitcask store. Every mutation — put, delete, and compaction — ultimately serializes data to disk through this method. It encodes a key-value pair into the on-disk binary format (header + payload), appends it to the current active segment file, and returns the byte offset where the record begins. That offset is what the in-memory hash index stores to locate the record later.
Preconditions:
self.activefile is open in append-binary mode ("ab") and is not None.key is a valid UTF-8 string. No length validation is performed — arbitrarily large keys are accepted.value is raw bytes. Can be arbitrary content, including the sentinel TOMBSTONE value (callers decide the semantics).Postconditions:
HEADERSIZE + len(keybytes) + len(value) bytes have been appended to the active segment file.fsync'd to disk).Invariant: The on-disk record is self-describing — the header contains enough information (key size, value size) to read the record without external metadata, and the CRC covers the full payload for integrity verification on read.
| Parameter | Type | Description |
|-----------|------|-------------|
| key | str | The logical key. Encoded to UTF-8 bytes internally. No maximum length enforced. |
| value | bytes | The raw value to store. May be actual user data or the TOMBSTONE sentinel. |
Edge cases: An empty string key ("") produces zero key bytes but is technically valid. An empty value (b"") is also valid — the CRC and sizes will reflect zero-length value.
Returns int — the byte offset within the active segment file where this record's header starts. The caller uses this offset (paired with the file path) to build the index entry for O(1) lookups. The caller is responsible for updating self._index; this method does not touch the index.
1. Encode the key string to UTF-8 bytes.
2. Concatenate key_bytes + value to form the payload.
3. Compute CRC-32 over the payload, masked to 32 bits unsigned.
4. Pack a 12-byte header: [crc32 | key_size | value_size] in network byte order (!III).
5. Capture the current file position (this is the record's offset).
6. Write header + payload as a single contiguous write.
7. Flush the write buffer.
8. Return the captured offset.
The CRC covers only the payload (key bytes + value), not the header itself. This means a corrupted header (e.g., wrong sizes) won't be caught by the CRC — but it will cause the reader to extract the wrong payload slice, which will then fail the CRC check indirectly.
self.activefile and flushes. This is the only method that writes data records to disk.self.activefile.tell() has advanced by the record size. This is how put detects when the segment exceeds maxsegment_size and needs rotation.self._index. The caller (put, delete, compact) decides whether and how to update the index.flush() pushes data from Python's buffer to the OS kernel buffer, but a crash before the OS writes to disk could lose the record. This is a durability trade-off for write throughput.This method does not catch any exceptions. Possible failures include:
UnicodeEncodeError if key contains characters that can't be encoded as UTF-8 (shouldn't happen for normal Python str).OSError/IOError if the underlying file write or flush fails (disk full, file closed, permission error).struct.error if the sizes overflow the !III format (each field is an unsigned 32-bit int, so keys or values larger than ~4 GB would fail).All of these propagate to the caller unhandled.
Called from three sites:
1. put(key, value) — writes user data, then updates the index with the returned offset.
2. delete(key) — writes the key with TOMBSTONE as value, then removes the key from the index.
3. compact() — rewrites live records to a new segment (though compaction actually bypasses writerecord and writes directly — it re-serializes records inline in the compaction loop).
The leading underscore signals this is an internal method. Callers must handle segment rotation *before* calling writerecord — the method itself has no size-checking logic.
struct — binary packing with format "!III" (network-order, three unsigned 32-bit ints = 12 bytes).zlib.crc32 — CRC-32 checksum for integrity. The & 0xFFFFFFFF mask ensures a positive unsigned value on all Python versions (Python 2's crc32 could return signed values; this is defensive).self.activefile — an open file handle in "ab" mode, managed by opennewsegment and rotate_segment.
┌──────────────────────── HEADER (12 bytes) ─────────────────────────┐
│ CRC32 (4B) │ key_size (4B) │ value_size (4B) │
├──────────────────────── PAYLOAD (variable) ────────────────────────┤
│ key_bytes (key_size B) │ value_bytes (value_size B) │
└───────────────────────────────────────────────────────────────────-┘
This is the exact format that scansegment and get expect when reading back.