Topic: The SSTable uses a magic number for file-type validation but no per-entry checksums; explore whether the magic check gives any meaningful corruption protection beyond "wrong file type"

Date: 2026-05-29

Time: 12:06

Magic Number Validation: File-Type Gate, Not Corruption Shield

The SSTable's magic number check lives in SSTableReader._init_ (line ~108 of sstable.py):


magic, version, self._entry_count = struct.unpack(HEADER_FMT, f.read(HEADER_SIZE))
assert magic == MAGIC, f"Invalid magic: {magic}"

The magic is the 4-byte literal b"SSTB" (line 10), written once at file offset 0 by SSTableWriter._init_ (line 52). That's the only integrity check in the entire read path. Let's be precise about what it catches and what it doesn't.

What the magic check actually protects against

1. Wrong file type — opening a JPEG, a log file, or another binary format. This is the intended purpose.

2. Completely zeroed file\x00\x00\x00\x00 != SSTB.

3. Gross header corruption — if the first 4 bytes are mangled, you get a clear error instead of silently misinterpreting the version and entry count.

What it does NOT protect against

Everything else. Specifically:

The header entry count is also unchecked

SSTableWriter.finish() (lines 88–95) seeks back to byte 0 and rewrites the header with the final count. But SSTableReader reads this count and stores it as self.entrycount — it never verifies that scanning the data section actually yields that many entries. A corrupted count leads to wrong metadata() results and potentially wrong compaction decisions.

Bottom line

The magic check is a file-type gate: it prevents opening non-SSTable files. It provides essentially zero corruption protection for the data that matters — keys, values, timestamps, and structural offsets. Any production SSTable implementation would add per-block or per-entry CRC32 checksums (as LevelDB/RocksDB do), plus a footer checksum covering the index. The test suite (test_sstable.py) has no corruption or truncation tests, which confirms this wasn't a design goal.

Topics to Explore

Beliefs