File: avro-serializer/test_avro.py

Date: 2026-05-29

Time: 12:46

Purpose

testavro.py is the primary integration test suite for the Avro serializer module. It validates that the avroserializer.py implementation correctly handles the full Avro specification surface: primitive types, complex types (records, arrays, maps, unions, enums), schema evolution with forward/backward compatibility, type promotion, a schema registry, and streaming decode. It's a standalone script (not pytest-based) — run via python test_avro.py — and doubles as a specification-by-example document for how the serializer API works.

Key Components

Primitive round-trip (test_primitives): Exercises every Avro primitive type — null, boolean, int, long, float, double, string, bytes — through encode/decode, including boundary values (2147483647, -2147483648, 2**40). Float comparison uses an epsilon tolerance (1e-6) since IEEE 754 floats lose precision in round-trips.

Zigzag encoding (test_zigzag): Validates the varint zigzag encoding (mapping signed integers to unsigned for compact representation). Tests the critical values: 0, ±1, ±2, and the int32 boundaries.

Complex types (testrecord, testarray, testmap, testunion, testenum): Each validates encode/decode round-trips. testrecord also asserts that field names are absent from the binary output (b"email" not in data), verifying Avro's schemaless wire format.

Schema evolution (testschemaevolutionaddfield, testtypepromotion, testenumevolution): The core of what makes Avro interesting. Tests that:

Incompatibility (test_incompatibility): Negative test — adding a required field (no default) to the reader schema must raise SchemaCompatibilityError.

Compatibility check (testcompatibilitycheck): Tests the standalone check_compatibility() function that reports backward/forward/full compatibility between two schemas without actually encoding data.

Schema registry (testschemaregistry): Tests SchemaRegistry — register a schema, encode with a schema ID prefix, decode with a different reader schema. This mirrors the Confluent Schema Registry pattern where the wire format is [schemaid][avropayload].

Nested structures (test_nested): Records containing arrays and maps, and records nested inside records.

Streaming (test_streaming): Concatenates two encoded integers into a single byte buffer, then decodes them sequentially from a BytesIO stream, verifying exact byte consumption (no leftover bytes).

Performance (test_performance): Benchmarks 10,000 record encode/decode cycles. No assertion on timing — purely informational output.

Patterns

Manual test harness: Uses print() + assert instead of pytest. Each test function prints its category name and sub-results, ending with ALL TESTS PASSED. This is common in DDIA reference implementations — self-contained, no external test framework dependency.

Round-trip testing idiom: Almost every test follows encode(value) → decode(bytes) → assert == value. This validates the encode/decode pair as inverses.

Writer/reader schema separation: AvroDecoder takes an optional second schema argument — AvroDecoder(writerschema, readerschema). When omitted, writer and reader are the same. This two-schema design is the key Avro abstraction for schema evolution.

Negative testing via try/except: test_incompatibility uses a try/except/assert-False pattern instead of pytest.raises. Idiomatic for framework-free test files.

Dependencies

Imports from avro_serializer:

Standard library: io.BytesIO (streaming test), time (performance benchmark).

Nothing imports this file — it's a leaf test module.

Flow

Execution is linear when run as _main_: zigzag → primitives → record → array → map → union → enum → evolution → promotion → incompatibility → compatibility check → registry → nested → compact → streaming → enum evolution → performance. Each test is independent — no shared state between tests.

The AvroDecoder call with two schemas (AvroDecoder(v1, v2).decode(data)) is the most important flow to understand: it reads bytes according to v1's field layout, then resolves fields against v2's expectations — filling defaults for missing fields, dropping fields not in v2, and promoting types where allowed.

Invariants

1. Round-trip fidelity: For same-schema encode/decode, output must equal input (exact for all types except float, which uses epsilon).

2. No field names on the wire: Record binary output must not contain field name strings — Avro encodes by field order, not by name.

3. Backward compatibility requires defaults: A reader schema can add fields only if they have defaults. Without a default, SchemaCompatibilityError is raised.

4. Exact byte consumption: Streaming decode must consume exactly the bytes for each value, leaving no residual bytes in the stream.

5. Type promotion is directional: Only widening promotions are valid (int→long, never long→int).

Error Handling

Only one error path is explicitly tested: SchemaCompatibilityError in testincompatibility. The test uses a try/except pattern and assert False as the failure branch. SchemaError is imported but not directly tested in this file (likely tested in testavro_serializer.py). No errors are swallowed — all assertions fail loudly.

Topics to Explore

Beliefs