Date: 2026-05-29
Time: 12:44
resolverecord — Avro Schema Evolution for Recordsresolverecord implements schema resolution between two record schemas — a writer schema (used when the data was serialized) and a reader schema (what the current application expects). This is the core mechanism enabling Avro's schema evolution: it lets you read data written with an older (or newer) schema using a different schema, as long as they're compatible.
This exists because in real systems, schemas change over time. A producer might add a field, remove one, or reorder them. Without resolution, every consumer would need to use the exact schema the data was written with. resolverecord bridges that gap.
Preconditions:
buf is a readable io.BytesIO positioned at the start of a serialized recordwriter and reader are both Schema objects with typename == "record" and matching name properties (the caller in decode enforces the name check before calling)buf are laid out in writer field order — this is how Avro binary encoding works (no field tags, just sequential values)Postconditions:
buf is advanced past all writer fields (consumed or skipped)dict with exactly the reader's fields, in reader field orderInvariants:
| Parameter | Type | Description |
|-----------|------|-------------|
| buf | io.BytesIO | Binary stream positioned at the start of the record's serialized data. Must contain enough bytes for all writer fields. |
| writer | Schema | The schema used when the data was encoded. Determines what's on the wire and in what order. |
| reader | Schema | The schema the caller wants the data shaped as. Determines which fields appear in the result and their order. |
Edge cases: If writer and reader have identical fields, this degenerates to a simple sequential decode. If the writer has zero fields, the result is built entirely from reader defaults.
A dict mapping field names (strings) to decoded values. The keys are exactly the reader's field names. The caller (_decode) returns this directly — no further transformation is applied.
Caller must handle: The returned dict always has all reader fields. No None sentinels for missing fields — if a field can't be resolved, the method raises rather than returning a partial result.
The method works in two passes:
for each field in writer.fields (in serialized order):
if field exists in reader:
decode it using both writer and reader field types (enables type promotion)
store in writer_values dict
else:
skip the bytes (read and discard) — the reader doesn't want this field
This pass must iterate in writer field order because Avro binary format has no field delimiters or tags. Each field's size is determined by its schema, so you must read (or skip) every field sequentially to know where the next one starts.
for each field in reader.fields:
if field was decoded from writer → use it
elif field has a default → use the default
else → raise SchemaCompatibilityError
The two-pass design decouples wire order (writer) from output order (reader). Note that writerfieldmap is constructed but never read — it's dead code, likely left from a refactor.
buf past all writer fields. This is the primary side effect and is essential — the buffer must be correctly positioned for whatever comes next in the stream.self._decode() for field values, which may itself recurse into nested records, arrays, maps, etc.self._skip() for discarded fields, which also advances buf.No I/O, no state mutations on self, no logging.
| Condition | Exception | Message |
|-----------|-----------|---------|
| Reader field missing from writer, no default | SchemaCompatibilityError | "Reader field '{name}' has no default and is missing from writer" |
| Type incompatibility in a shared field | SchemaCompatibilityError | Raised by recursive _decode (e.g., can't resolve int to string) |
| Truncated buffer | ValueError | Raised by read_varint or struct.unpack deeper in the call stack |
No errors are swallowed. Every failure path either raises or propagates an exception from a callee.
Called exclusively from _decode when both writer and reader are "record" type with matching names:
# In _decode:
if wt == "record" and rt == "record":
if writer.name != reader.name:
raise SchemaCompatibilityError(...)
return self._resolve_record(buf, writer, reader)
Typical end-user scenario — decoding data written with schema v1 using schema v2:
v1 = Schema({"type": "record", "name": "User", "fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"}
]})
v2 = Schema({"type": "record", "name": "User", "fields": [
{"name": "name", "type": "string"},
{"name": "age", "type": "int"},
{"name": "email", "type": "string", "default": ""}
]})
decoder = AvroDecoder(writer_schema=v1, reader_schema=v2)
result = decoder.decode(data) # result["email"] == ""
self._decode() — recursive descent for field values; handles type promotion via PROMOTIONSself._skip() — consumes bytes for unwanted fields without materializing valuesSchema.fields — provides field lists with name, type, and optional defaultstruct and io from stdlib1. writerfieldmap is constructed but unused — a minor dead-code issue, not a bug.
2. Defaults are used as-is (no deep copy) — if a default is a mutable value (list/dict), multiple records could share the same default object. Mutating one would affect others.
3. No cycle detection for nested records — a record containing itself would recurse infinitely. The Schema parser doesn't support recursive types, so this can't happen in practice, but it's not enforced at this level.
4. Field order in the result dict relies on Python 3.7+ insertion-order guarantee — the code iterates reader fields sequentially and inserts into a plain dict.