File: hash-index-storage/test_bitcask.py

Date: 2026-05-28

Time: 18:59

hash-index-storage/test_bitcask.py

Purpose

This is the test suite for a Bitcask storage engine implementation — a log-structured key-value store described in Chapter 3 of *Designing Data-Intensive Applications*. It validates the full lifecycle of a Bitcask store: writing, reading, updating, deleting, file rotation, compaction, hint-file generation, and crash recovery. It serves as both a correctness harness and a living specification of BitcaskStore's behavioral contract.

Key Components

Test Functions

| Function | What it validates |

|---|---|

| testbasiccrud | Put, get, delete, _len, contains_, and missing-key returns None |

| test_overwrite | Multiple puts to the same key — only the latest value survives, length stays 1 |

| testfilerotation | Writing 100 entries with maxfilesize=256 forces multiple .data files |

| test_compaction | After 100 overwrites of one key and a delete of another, compact() preserves only live data |

| testhintfiles | compact() produces .hint files, and a fresh store loads correctly from them |

| teststartuprecovery | Close and reopen — all data survives via normal replay |

| testcrashrecovery | Delete .hint files, then reopen — store rebuilds its in-memory index by scanning .data files |

| testlargedataset | 10k keys, half overwritten, compacted — verifies correctness at scale |

| testedgecases | Empty store operations, delete-then-reinsert, delete of nonexistent key |

BitcaskStore API (inferred contract)


BitcaskStore(directory: str, max_file_size: int = ..., sync_writes: bool = True)
store.put(key: str, value: str) -> None
store.get(key: str) -> Optional[str]
store.delete(key: str) -> None
store.compact() -> None
store.keys() -> list
store.close() -> None
len(store) -> int          # via __len__
key in store -> bool       # via __contains__

Patterns

Temp directory isolation. Every test uses tempfile.TemporaryDirectory() as a context manager, giving each test a clean filesystem with automatic cleanup. No test can leak state to another.

sync_writes=False everywhere. All tests disable fsync to avoid I/O bottlenecks. This is a deliberate tradeoff — tests run fast but don't exercise the durable-write path.

Close-and-reopen idiom. Tests like teststartuprecovery, testhintfiles, and testcrashrecovery create a store, close it, then open a *new* BitcaskStore on the same directory. This validates that the on-disk format is self-describing and the startup index-rebuild logic is correct.

Manual test runner. The if _name == "main_" block runs all tests sequentially with print("PASS: ...") output. No pytest fixtures or decorators — tests are plain functions that assert and fail loudly.

Dependencies

Imports:

Imported by:

Flow

Each test follows the same pattern:

1. Create a temp directory

2. Instantiate BitcaskStore pointing at it

3. Perform writes/reads/deletes

4. Assert expected state

5. (Optionally) close, mutate the filesystem, reopen, and re-assert

6. Close the store; temp directory auto-cleans

The _main_ block runs them in a fixed order, but they have no inter-test dependencies — each is fully isolated.

Invariants

Error Handling

There is none — by design. Tests assert and crash on failure. store.get() returns None for missing keys rather than raising (tested in testbasiccrud and testedgecases). store.delete() on a nonexistent key is silently accepted (tested in testedgecases). No exception-path testing exists — there are no tests for corrupted files, permission errors, or disk-full scenarios.

Topics to Explore

Beliefs