Date: 2026-05-29
Time: 12:46
testavro.py is the primary integration test suite for the Avro serializer module. It validates that the avroserializer.py implementation correctly handles the full Avro specification surface: primitive types, complex types (records, arrays, maps, unions, enums), schema evolution with forward/backward compatibility, type promotion, a schema registry, and streaming decode. It's a standalone script (not pytest-based) — run via python test_avro.py — and doubles as a specification-by-example document for how the serializer API works.
Primitive round-trip (test_primitives): Exercises every Avro primitive type — null, boolean, int, long, float, double, string, bytes — through encode/decode, including boundary values (2147483647, -2147483648, 2**40). Float comparison uses an epsilon tolerance (1e-6) since IEEE 754 floats lose precision in round-trips.
Zigzag encoding (test_zigzag): Validates the varint zigzag encoding (mapping signed integers to unsigned for compact representation). Tests the critical values: 0, ±1, ±2, and the int32 boundaries.
Complex types (testrecord, testarray, testmap, testunion, testenum): Each validates encode/decode round-trips. testrecord also asserts that field names are absent from the binary output (b"email" not in data), verifying Avro's schemaless wire format.
Schema evolution (testschemaevolutionaddfield, testtypepromotion, testenumevolution): The core of what makes Avro interesting. Tests that:
email field disappears).int→long, int→float, int→double, long→double, float→double.default symbol when the writer sends an unknown symbol.Incompatibility (test_incompatibility): Negative test — adding a required field (no default) to the reader schema must raise SchemaCompatibilityError.
Compatibility check (testcompatibilitycheck): Tests the standalone check_compatibility() function that reports backward/forward/full compatibility between two schemas without actually encoding data.
Schema registry (testschemaregistry): Tests SchemaRegistry — register a schema, encode with a schema ID prefix, decode with a different reader schema. This mirrors the Confluent Schema Registry pattern where the wire format is [schemaid][avropayload].
Nested structures (test_nested): Records containing arrays and maps, and records nested inside records.
Streaming (test_streaming): Concatenates two encoded integers into a single byte buffer, then decodes them sequentially from a BytesIO stream, verifying exact byte consumption (no leftover bytes).
Performance (test_performance): Benchmarks 10,000 record encode/decode cycles. No assertion on timing — purely informational output.
Manual test harness: Uses print() + assert instead of pytest. Each test function prints its category name and sub-results, ending with ALL TESTS PASSED. This is common in DDIA reference implementations — self-contained, no external test framework dependency.
Round-trip testing idiom: Almost every test follows encode(value) → decode(bytes) → assert == value. This validates the encode/decode pair as inverses.
Writer/reader schema separation: AvroDecoder takes an optional second schema argument — AvroDecoder(writerschema, readerschema). When omitted, writer and reader are the same. This two-schema design is the key Avro abstraction for schema evolution.
Negative testing via try/except: test_incompatibility uses a try/except/assert-False pattern instead of pytest.raises. Idiomatic for framework-free test files.
Imports from avro_serializer:
Schema — schema definition objectAvroEncoder / AvroDecoder — encode/decode enginesSchemaRegistry — schema ID managementSchemaError, SchemaCompatibilityError — error typescheck_compatibility — standalone compatibility checkerzigzagencode, zigzagdecode — low-level varint primitivesStandard library: io.BytesIO (streaming test), time (performance benchmark).
Nothing imports this file — it's a leaf test module.
Execution is linear when run as _main_: zigzag → primitives → record → array → map → union → enum → evolution → promotion → incompatibility → compatibility check → registry → nested → compact → streaming → enum evolution → performance. Each test is independent — no shared state between tests.
The AvroDecoder call with two schemas (AvroDecoder(v1, v2).decode(data)) is the most important flow to understand: it reads bytes according to v1's field layout, then resolves fields against v2's expectations — filling defaults for missing fields, dropping fields not in v2, and promoting types where allowed.
1. Round-trip fidelity: For same-schema encode/decode, output must equal input (exact for all types except float, which uses epsilon).
2. No field names on the wire: Record binary output must not contain field name strings — Avro encodes by field order, not by name.
3. Backward compatibility requires defaults: A reader schema can add fields only if they have defaults. Without a default, SchemaCompatibilityError is raised.
4. Exact byte consumption: Streaming decode must consume exactly the bytes for each value, leaving no residual bytes in the stream.
5. Type promotion is directional: Only widening promotions are valid (int→long, never long→int).
Only one error path is explicitly tested: SchemaCompatibilityError in testincompatibility. The test uses a try/except pattern and assert False as the failure branch. SchemaError is imported but not directly tested in this file (likely tested in testavro_serializer.py). No errors are swallowed — all assertions fail loudly.
avro-serializer/avroserializer.py — The implementation behind all these tests; see how decode handles writer/reader schema resolution and type promotionavro-serializer/avroserializer.py:zigzagencode — The varint encoding that makes Avro integers compact; key to understanding why small numbers take fewer bytesavro-serializer/testavroserializer.py — The companion test file; likely covers edge cases, error paths, and SchemaError scenarios not tested hereavro-schema-resolution — Avro spec's schema resolution rules (Chapter 4 of DDIA) — how reader/writer schema mismatches are reconciled at decode timeconfluent-schema-registry-wire-format — The [magicbyte][schemaid][payload] framing that SchemaRegistry.encodewithid implementsavro-no-field-names-on-wire — Avro record binary encoding contains no field names; fields are identified purely by schema-defined order, verified by testrecord and testcompact_encodingavro-backward-compat-requires-defaults — A reader schema that adds a field without a default value causes SchemaCompatibilityError during decode; backward compatibility demands defaults on all new fieldsavro-decoder-two-schema-resolution — AvroDecoder accepts (writerschema, readerschema) and resolves fields between them at decode time — filling defaults for added fields and silently dropping removed fieldsavro-type-promotion-widening-only — Type promotion supports int→long, int→float, int→double, long→double, float→double — strictly widening conversions, no narrowingavro-enum-default-fallback — When a writer sends an enum symbol not in the reader's symbol list, the reader falls back to the default symbol declared in the reader schema