Date: 2026-05-29
Time: 11:28
The observations include enough to answer this. Let me work with what's there — the key comment is visible at lines 220-222.
Partial batches (PUT records without a trailing COMMIT) are silently included during replay. This is a bug — or at minimum, a design choice that undermines the entire purpose of the COMMIT marker.
append_batch writesIn wal.py:154-166, append_batch builds a buffer containing all the batch's PUT/DELETE records followed by a single COMMIT record, then writes the entire buffer in one write() call with a forced fsync:
def append_batch(self, operations):
buf = bytearray()
for op_type, key, value in operations:
self._seq_num += 1
buf.extend(_encode_record(...))
self._seq_num += 1
commit_seq = self._seq_num
buf.extend(_encode_record(commit_seq, OP_COMMIT, b"", b""))
self._fd.write(bytes(buf))
self._do_sync(force=True)
The COMMIT record is the atomicity boundary. If the process crashes mid-write, the OS may have flushed some PUT records to disk but not the final COMMIT. This is the scenario the COMMIT marker exists to handle.
replay processes recordsThe comment at lines 220-222 reveals the replay logic:
# Individual writes (no COMMIT needed) and batch writes (have COMMIT)
# ...
# marker. All PUT/DELETE records are included; COMMIT/CHECKPOINT are
The comment is truncated in the observations, but the intent is clear: replay returns all PUT/DELETE records regardless of whether a COMMIT follows them. COMMIT and CHECKPOINT records are filtered out (they're control records, not data).
This means replay treats batch records identically to standalone records — it does not group PUTs by their associated COMMIT and does not discard orphaned PUTs.
Consider a batch that transfers money between accounts:
1. PUT account:A balance:900 — written to disk before crash
2. PUT account:B balance:1100 — written to disk before crash
3. COMMIT — never written (crash happened here)
On replay, the current implementation would apply record 1 (debiting account A) but might not have record 2 (crediting account B). The COMMIT marker was supposed to prevent exactly this — you should only apply records 1 and 2 if record 3 exists.
A correct implementation would:
1. Scan forward through records, accumulating uncommitted PUTs/DELETEs in a pending buffer
2. When a COMMIT is encountered, flush the pending buffer to the result set
3. When EOF is reached, discard anything still in the pending buffer
4. Standalone PUTs (those not part of any batch) need a different strategy — either they get their own implicit commit, or the protocol needs a BEGIN marker to distinguish batch-start from standalone
The test at testwal.py:31-39 (testcrash_recovery) only tests that individual PUT records survive a simulated crash (closing and reopening the WAL). It never tests the partial-batch scenario:
def test_crash_recovery():
wal = WriteAheadLog(tmpdir, sync_mode="sync")
wal.append("PUT", "k1", "v1") # standalone, no batch
wal.append("PUT", "k2", "v2") # standalone, no batch
wal2 = WriteAheadLog(tmpdir, sync_mode="sync")
records = wal2.replay()
assert len(records) == 2 # of course these survive
A proper test would write PUT records to the WAL file without a trailing COMMIT (simulating a crash mid-batch), then verify that replay discards them.
The protocol has no BEGIN marker. When replay encounters a PUT, it cannot tell whether this is a standalone write (which should be applied immediately) or the first record in a batch (which should wait for COMMIT). The comment at line 220 acknowledges this — "Individual writes (no COMMIT needed)" — but this means the COMMIT marker is effectively decorative. Replay can't use it for atomicity because it can't identify which records belong to batches.