File: avro-serializer/testavroserializer.py

Date: 2026-05-29

Time: 12:42

Purpose

This is the test suite for the avro_serializer module — a from-scratch implementation of Apache Avro's binary serialization format. Its job is to verify that the implementation correctly handles Avro's encoding spec, schema evolution semantics, and compatibility checking. These are the core concepts from DDIA Chapter 4 (Encoding and Evolution), making this test file a specification-as-code for how schema-driven serialization should behave.

The tests are organized into 13 numbered groups that progress from low-level encoding primitives up to high-level schema registry operations, mirroring the layers of the Avro format itself.

Key Components

Test Groups (by number):

| # | Focus | What it proves |

|---|-------|----------------|

| 1 | Primitive round-trips | Every Avro primitive type (null, boolean, int, long, float, double, string, bytes) survives encode→decode |

| 2 | Zigzag encoding | The variable-length integer encoding matches the Avro spec's canonical values |

| 3 | Complex types | Records, arrays, maps, unions, and enums all round-trip correctly |

| 4 | Schema evolution | The central DDIA concept — reading data written with an older/different schema |

| 5 | Compatibility checking | Static analysis of whether two schemas are compatible without encoding data |

| 6 | Schema registry | End-to-end: register schema, encode with ID, decode with a different reader schema |

| 7 | Nested schemas | Records containing arrays (composition) |

| 8 | Schema validation | Invalid schema definitions are rejected at construction time |

| 9 | Fixed type | Fixed-size binary blobs with size enforcement |

| 10 | Canonical equality | Schema("int") and Schema({"type": "int"}) are treated as identical |

| 11 | Field reordering | Reader and writer can have the same fields in different order |

| 12 | Union disambiguation | When a union contains both a record and a map (both dicts in Python), the encoder picks the right branch |

| 13 | Int range validation | int enforces 32-bit signed bounds; long accepts larger values |

Key imports from avro_serializer:

Patterns

Parametrized exhaustive coverage. Tests 1 and 2 use @pytest.mark.parametrize to cover boundary values (0, -1, max int32, min int32, 2^40) and type variants in a single test function. This is the right pattern for codec testing where you want many inputs against the same logic.

Two-schema decoder for evolution. The AvroDecoder(writerschema, readerschema) constructor pattern appears in tests 4, 6, and 11. When only one schema is passed, the decoder reads with the same schema that wrote the data. When two are passed, it performs schema resolution — the core of Avro's evolution model.

Assertion on binary representation. test_record (test 3) asserts that field names (b"id", b"email") are absent from the encoded bytes. This validates a key Avro property: the binary format carries no field metadata, relying entirely on the schema for structure.

Negative testing with pytest.raises. Tests 4, 8, 9, and 13 verify that specific invalid operations produce the correct exception type, not just any error.

Dependencies

Imports:

Imported by: Nothing — this is a leaf test file. It is run by pytest and exists solely to validate avro_serializer.py.

Flow

Each test follows the same pattern:

1. Construct a Schema from a type string or dict definition

2. Create an AvroEncoder bound to that schema

3. Encode a Python value → bytes

4. Create an AvroDecoder bound to the same or a different schema

5. Decode the bytes → Python value

6. Assert the decoded value matches the original (or the expected evolved form)

For schema evolution tests (4, 6, 11), step 4 uses a *different* reader schema, and step 6 asserts the value matches the reader's expectations (added defaults, reordered fields, promoted types).

Invariants

Error Handling

The test suite validates three error types:

| Exception | When raised | Test coverage |

|-----------|-------------|---------------|

| SchemaError | Invalid schema definition (unknown type, empty union, duplicate union branches, fixed missing name/size) | Tests 8, 9 |

| SchemaCompatibilityError | Reader schema has a required field that the writer didn't write and has no default | Test 4 |

| ValueError | Encoding a value that doesn't fit the schema's constraints (int overflow, wrong fixed size) | Tests 9, 13 |

Errors are tested with pytest.raises context managers — the tests verify the exception type but not the message content, which keeps them resilient to wording changes.

Topics to Explore

Beliefs