Date: 2026-05-29
Time: 08:15
writerecord — Append a key-value record to the active data filewriterecord is the single write path for all mutations in this Bitcask store. Every put and delete flows through it. It serializes a key-value pair into a binary record format (header + key + value), appends it to the currently active data file, and returns the metadata the caller needs to update the in-memory index (keydir).
This method only handles the physical write. It does not update the keydir — that's the caller's responsibility, which is a deliberate separation of concerns. The method also doesn't handle file rotation; mayberotate is called by the caller before invoking this.
Preconditions:
self.active_file is open in append-binary mode ("ab") and positioned at the end of the file.key and value are valid Python strings (UTF-8 encodable). There is no validation — non-encodable strings will raise.mayberotate() if size limits matter.Postconditions:
len(record) bytes.flush()), and optionally fsynced to disk.offset points to the start of the newly written record in the active file.Invariants:
[timestamp: float64][keysize: uint32][valsize: uint32][keybytes][valbytes].| Parameter | Type | Description |
|-----------|------|-------------|
| key | str | The key to store. Encoded to UTF-8 for serialization. No length limit enforced — a key larger than maxfilesize would silently succeed. |
| value | str | The value to store. An empty string "" is the tombstone convention used by delete(). |
Returns tuple[int, int, float]:
| Index | Name | Meaning |
|-------|------|---------|
| 0 | offset | Byte position where this record starts in the active file. Used as the disk pointer in keydir. |
| 1 | size | Total byte length of the record (header + key + value). Used to read the record back in readrecord. |
| 2 | ts | The time.time() timestamp captured at the start of the write. Used for conflict resolution during compaction (latest timestamp wins). |
The caller must use these to construct a KeyEntry and update keydir, or (for deletes) to pop the key from keydir.
1. Capture timestamp — time.time() gives wall-clock seconds as a float. This happens first, before any I/O, so the timestamp reflects intent time, not write-completion time.
2. Encode key and value — Both are UTF-8 encoded to raw bytes. The byte lengths are needed for the header.
3. Pack the header — Uses struct.pack with format <dII (little-endian: 8-byte double for timestamp, two 4-byte unsigned ints for key/value sizes). This produces exactly 16 bytes (HEADER_SIZE).
4. Assemble the record — Simple concatenation: header + keybytes + valbytes. No CRC or checksum — the implementation trusts the filesystem.
5. Capture current offset — self.active_file.tell() gives the byte position where the record will land. Because the file is opened in append mode, this is always the end of the file.
6. Write — The entire record is written in one write() call, which on most OSes is atomic for reasonable sizes (below PIPE_BUF), but there's no explicit guarantee here for very large records.
7. Flush — flush() pushes Python's userspace buffer to the OS. This is always done.
8. Optional fsync — If self.sync_writes is True (the default), os.fsync() forces the OS to write through to the physical disk. This is the durability guarantee — without it, a crash could lose recently written records that were still in the OS page cache.
9. Return metadata — The offset, total record size, and timestamp are returned for the caller to index.
.data file. This is the primary mutation.sync_writes=False trades durability for throughput.tell() will return a higher offset.keydir is untouched — this is purely a log-append operation.There is no explicit error handling. The following can propagate to the caller:
UnicodeEncodeError — if key or value contains characters that can't be UTF-8 encoded (unlikely for Python str, but possible with surrogates).OSError / IOError — if the write, flush, or fsync fails (disk full, file closed, I/O error).struct.error — if key or value byte lengths exceed 2^32 - 1 (the uint32 max), the struct.pack will raise.None of these are caught — they bubble up through put() or delete() to the application. A partial write (crash mid-write) would leave a truncated record at the end of the file, which scandatafile would hit as a short read and silently stop scanning (the if len(headerdata) < HEADER_SIZE: break guard).
Called in exactly two places:
put(key, value) — writes the record, then updates keydir with a new KeyEntry pointing to it.delete(key) — writes a tombstone (empty value ""), then removes the key from keydir.The caller always calls mayberotate() first to ensure the active file hasn't exceeded maxfilesize. This ordering matters — if rotation happens after the write, the record lands in an oversized file but still works.
| Dependency | Usage |
|------------|-------|
| time.time() | Monotonically-ish increasing wall clock for timestamps. Not monotonic — clock adjustments can produce out-of-order timestamps, which would confuse compaction's "latest wins" logic. |
| struct | Binary packing with HEADER_FORMAT = "<dII" (16 bytes). |
| os.fsync | Durability guarantee when sync_writes is enabled. |
| self.activefile | Must be an open file handle in "ab" mode. Managed by openactivefile(). |
write() can leave a partial record. Recovery relies on scandata_file stopping at the truncated tail.time.time(): Wall-clock timestamps are used for "latest wins" during compaction, but time.time() is not monotonic — NTP adjustments or manual clock changes could cause a newer write to have an older timestamp.write-record-is-append-only — writerecord only appends to the active file; it never overwrites or seeks backward, preserving the append-only log invariant.write-record-no-index-mutation — writerecord does not modify keydir; the caller is responsible for updating the in-memory index after the write returns.tombstone-convention-empty-value — Deletions are encoded as records with an empty value string (""); there is no separate tombstone marker byte or flag.no-crc-on-records — Records have no checksum or CRC field, so on-disk corruption cannot be detected during reads or compaction.fsync-controlled-by-sync-writes — Durability depends on the sync_writes flag: when True, every write is fsynced to disk; when False, data may be lost on crash.