Function: resolverecord in avro-serializer/avro_serializer.py

Date: 2026-05-29

Time: 12:44

resolverecord — Avro Schema Evolution for Records

Purpose

resolverecord implements schema resolution between two record schemas — a writer schema (used when the data was serialized) and a reader schema (what the current application expects). This is the core mechanism enabling Avro's schema evolution: it lets you read data written with an older (or newer) schema using a different schema, as long as they're compatible.

This exists because in real systems, schemas change over time. A producer might add a field, remove one, or reorder them. Without resolution, every consumer would need to use the exact schema the data was written with. resolverecord bridges that gap.

Contract

Preconditions:

Postconditions:

Invariants:

Parameters

| Parameter | Type | Description |

|-----------|------|-------------|

| buf | io.BytesIO | Binary stream positioned at the start of the record's serialized data. Must contain enough bytes for all writer fields. |

| writer | Schema | The schema used when the data was encoded. Determines what's on the wire and in what order. |

| reader | Schema | The schema the caller wants the data shaped as. Determines which fields appear in the result and their order. |

Edge cases: If writer and reader have identical fields, this degenerates to a simple sequential decode. If the writer has zero fields, the result is built entirely from reader defaults.

Return Value

A dict mapping field names (strings) to decoded values. The keys are exactly the reader's field names. The caller (_decode) returns this directly — no further transformation is applied.

Caller must handle: The returned dict always has all reader fields. No None sentinels for missing fields — if a field can't be resolved, the method raises rather than returning a partial result.

Algorithm

The method works in two passes:

Pass 1 — Consume the wire data (writer field order)


for each field in writer.fields (in serialized order):
    if field exists in reader:
        decode it using both writer and reader field types (enables type promotion)
        store in writer_values dict
    else:
        skip the bytes (read and discard) — the reader doesn't want this field

This pass must iterate in writer field order because Avro binary format has no field delimiters or tags. Each field's size is determined by its schema, so you must read (or skip) every field sequentially to know where the next one starts.

Pass 2 — Build the result (reader field order)


for each field in reader.fields:
    if field was decoded from writer → use it
    elif field has a default → use the default
    else → raise SchemaCompatibilityError

The two-pass design decouples wire order (writer) from output order (reader). Note that writerfieldmap is constructed but never read — it's dead code, likely left from a refactor.

Side Effects

No I/O, no state mutations on self, no logging.

Error Handling

| Condition | Exception | Message |

|-----------|-----------|---------|

| Reader field missing from writer, no default | SchemaCompatibilityError | "Reader field '{name}' has no default and is missing from writer" |

| Type incompatibility in a shared field | SchemaCompatibilityError | Raised by recursive _decode (e.g., can't resolve int to string) |

| Truncated buffer | ValueError | Raised by read_varint or struct.unpack deeper in the call stack |

No errors are swallowed. Every failure path either raises or propagates an exception from a callee.

Usage Patterns

Called exclusively from _decode when both writer and reader are "record" type with matching names:


# In _decode:
if wt == "record" and rt == "record":
    if writer.name != reader.name:
        raise SchemaCompatibilityError(...)
    return self._resolve_record(buf, writer, reader)

Typical end-user scenario — decoding data written with schema v1 using schema v2:


v1 = Schema({"type": "record", "name": "User", "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"}
]})
v2 = Schema({"type": "record", "name": "User", "fields": [
    {"name": "name", "type": "string"},
    {"name": "age", "type": "int"},
    {"name": "email", "type": "string", "default": ""}
]})
decoder = AvroDecoder(writer_schema=v1, reader_schema=v2)
result = decoder.decode(data)  # result["email"] == ""

Dependencies

Unforced Assumptions

1. writerfieldmap is constructed but unused — a minor dead-code issue, not a bug.

2. Defaults are used as-is (no deep copy) — if a default is a mutable value (list/dict), multiple records could share the same default object. Mutating one would affect others.

3. No cycle detection for nested records — a record containing itself would recurse infinitely. The Schema parser doesn't support recursive types, so this can't happen in practice, but it's not enforced at this level.

4. Field order in the result dict relies on Python 3.7+ insertion-order guarantee — the code iterates reader fields sequentially and inserts into a plain dict.