auto_commit in poll() creates at-least-once semantics with potential duplicate processing on crashDate: 2026-05-29
Time: 13:43
The observations include the full Consumer._init_ and the key grep hits but don't include the body of poll() (lines 237–260). Let me work with what we have and note the gap.
partitioned-log/partitionedlog.py:poll — Read the full poll() body (lines 237–260) to confirm whether autocommit commits current or previous batch offsets, and whether commit happens before or after the returnpartitioned-log/partitioned_log.py:commit — Understand what state gets persisted to the broker and how a restarting consumer loads itstream-join-processor/streamjoinprocessor.py — The stream join processor processes events with watermarks but has no offset commit mechanism at all — compare its failure semanticsidempotent-consumer-pattern — Several modules (consistent-hashing, multi-leader-replication, PBFT) implement idempotency checks; study how they deduplicate to understand the exactly-once alternativepartitioned-log/partitioned_log.py:seek — The seek API (tested at line 170) lets consumers manually rewind, which is the escape hatch when at-least-once reprocessing produces incorrect resultsauto-commit-not-default — Consumer defaults to auto_commit=False (line 182), requiring explicit commit() calls; this makes at-least-once the opt-in behavior rather than the defaultoffset-tracking-per-partition — Consumer tracks offsets as a dict[tuple[str, int], int] mapping (topic, partition) to offset (line 190), meaning commit granularity is per-partition, not per-messageno-transactional-commit — The implementation provides no mechanism to atomically commit offsets together with processing side effects, ruling out exactly-once delivery without external idempotencyconsumer-group-offset-resume — A new consumer joining a group loads committed offsets from the broker (tested at line 152), which is the mechanism that enables both offset-based resume and duplicate redelivery on crash