Function: append_batch in write-ahead-log/wal.py

Date: 2026-05-28

Time: 18:45

append_batch — Atomic Batch Write with Commit Marker

Purpose

append_batch writes a group of operations (PUTs and DELETEs) to the write-ahead log as a single atomic unit, terminated by a COMMIT record. This is the transactional write path — either all operations in the batch are durable and committed, or none of them are (if a crash occurs before the fsync completes). It exists to support multi-key transactions where partial application would leave the database in an inconsistent state.

Contract

Preconditions:

Postconditions:

Invariants:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| operations | List[Tuple[str, str, str]] | List of (op_type, key, value) tuples. An empty list is technically valid — it would write a lone COMMIT record. The code does not guard against this. |

Return Value

Returns int — the sequence number assigned to the COMMIT record. This is the highest sequence number in the batch and serves as the batch's identity. Callers can pass this to truncate(uptoseq) to garbage-collect this batch after it has been applied to the main data store.

Algorithm

1. Acquire the lock — prevents concurrent writes from interleaving records.

2. Build a buffer — allocates a bytearray to accumulate all encoded records before any I/O.

3. Encode each operation — for each (op_type, key, value):

4. Append a COMMIT sentinel — increment seqnum once more, encode a COMMIT record with empty key/value, append to buffer.

5. Single writeself._fd.write(bytes(buf)) sends the entire batch to the OS in one write() call. This doesn't guarantee atomicity at the filesystem level, but it minimizes the window for partial writes.

6. Force fsyncdosync(force=True) unconditionally flushes and fsyncs, regardless of the configured sync mode. Batches always force durability.

7. Maybe rotate — if the file has grown past maxfile_size, close it and open a new numbered WAL file.

8. Return the COMMIT sequence number.

Side Effects

Error Handling

The method has no try/except blocks. Failures propagate directly to the caller:

Critical concern: if write() succeeds but fsync() fails, the sequence numbers have already been incremented and the partial data may be in the OS page cache. There's no rollback. On recovery, readrecord would see partial/corrupt data and recoverseqnum would scan past it (the while True loop in recoverseqnum breaks on ValueError from CRC mismatch), but the gap in sequence numbers is permanent.

Usage Patterns


wal = WriteAheadLog("/tmp/wal_dir")

# Typical transactional write
commit_seq = wal.append_batch([
    ("PUT", "account:alice", "balance:900"),
    ("PUT", "account:bob", "balance:1100"),
])

# After applying to the main store, truncate
wal.truncate(commit_seq)

Callers are responsible for:

1. Not calling after close().

2. Using the returned sequence number to track what has been applied.

3. Handling I/O exceptions if disk failures are possible.

Dependencies

| Dependency | Usage |

|------------|-------|

| encoderecord (module-level) | Serializes each record to the binary wire format with CRC32 |

| OP_BYTES (module-level dict) | Maps string op names → integer op codes |

| OP_COMMIT (module-level constant, value 3) | The commit sentinel op code |

| threading.Lock | Mutual exclusion across threads |

| os.fsync | Durability guarantee |

| dosync, mayberotate | Internal helpers for sync policy and file management |

Beliefs