Date: 2026-05-29
Time: 06:56
hash-index-storage/testertestbitcask.pyThis is the independent verification test suite for the Bitcask hash-index storage engine. The tester prefix distinguishes it from the implementation's own testbitcask.py — this file was written by a separate "tester" agent to validate the implementation against its spec, providing a second pair of eyes. It exercises BitcaskStore end-to-end: CRUD, file rotation, compaction, hint-file recovery, crash recovery, scale, and edge cases.
The file is designed to run both as a pytest module and standalone via _main_, printing PASS: <name> per test for quick visual confirmation.
| Function | What it validates |
|---|---|
| testbasiccrud | Full lifecycle: put, get, delete, keys, len, _contains_ on a fresh store |
| test_overwrite | Multiple writes to the same key return the latest value; len stays 1 |
| testfilerotation | When maxfilesize=256 is exceeded, multiple .data files appear on disk; all keys remain readable |
| test_compaction | After 100 overwrites + a delete, compact() preserves only live data and removes tombstones |
| testhintfilesandstartup | compact() produces .hint files; a new BitcaskStore instance on the same directory rebuilds correctly from hints |
| teststartuprecoverynohints | Manually deletes .hint files, then reopens — forces full data-file scan recovery and verifies correctness |
| testlargedataset | 10k keys with 5k overwrites; spot-checks both updated and untouched keys |
| testedgecases | Empty-store operations, delete of nonexistent key (no crash), put-delete-put cycle |
| testexamplefrom_spec | Runs the exact scenario from the task specification as a golden-path regression |
Every test follows the same structure:
1. Create a tempfile.TemporaryDirectory
2. Instantiate BitcaskStore(dir, sync_writes=False, ...)
3. Exercise operations and assert
4. Explicitly call s.close()
5. Print PASS: <name>
sync_writes=False is used everywhere to avoid fsync overhead in tests — the tests care about logical correctness, not durability guarantees.
tempfile.TemporaryDirectory, so tests are fully independent and leave no filesystem residue. The with block handles cleanup._main_ block. The print("PASS: ...") lines provide human-readable output when running outside pytest.testhintfilesandstartup and teststartuprecoverynohints both close the store and open a new instance on the same directory — this is the critical pattern for testing persistence and recovery, since Bitcask rebuilds its in-memory index on startup.maxfilesize to small values (256, 512, 1024) to force file rotation without writing megabytes of data.Imports:
os — listing directory contents to verify .data and .hint file creation, removing hint filestempfile — isolated test directoriesbitcask.BitcaskStore — the system under testImported by: Nothing imports this file. It's a leaf test module.
When run standalone (python testertestbitcask.py), tests execute sequentially in declaration order. Each test is self-contained: create dir → open store → operate → assert → close → print. No shared state between tests.
The two recovery tests (testhintfilesandstartup, teststartuprecoverynohints) have a two-phase flow: write data with one instance, close it, then open a second instance and verify the data survived the restart. This tests the startup index-rebuild path.
testoverwrite and testcompaction verify that repeated put calls to the same key always yield the most recent value.delete("a"), get("a") returns None, "a" not in s, and len decrements. testedgecases verifies a subsequent put resurrects the key.compact(), deleted keys stay deleted, and the latest value for live keys is unchanged..data files exist.There is no explicit error handling — tests rely on assertions and will raise AssertionError on failure. The design assumes BitcaskStore doesn't raise on delete of a nonexistent key (verified in testedgecases). The with tempfile.TemporaryDirectory() blocks ensure cleanup even if a test fails mid-execution.
hash-index-storage/bitcask.py — The implementation under test; understanding put/get/compact/hint-file format is essential contexthash-index-storage/test_bitcask.py — The implementation's own test suite; compare coverage and approach with this tester filebitcask-hint-file-format — How hint files encode the keydir for fast startup without scanning all data fileshash-index-storage/bitcask.py:compact — The compaction algorithm that merges data files, removes dead entries, and emits hint fileslog-structured-hash-table/bitcask.py — A second Bitcask implementation in the repo; compare design choices between the twotester-bitcask-sync-writes-disabled — All tester tests pass sync_writes=False to BitcaskStore, testing logical correctness without fsync overheadtester-bitcask-recovery-two-paths — The test suite verifies two distinct recovery paths: hint-file-based rebuild and full data-file scan (with hints deleted)tester-bitcask-compaction-removes-tombstones — After compact(), deleted keys remain inaccessible and len reflects only live keystester-bitcask-file-rotation-transparent — File rotation (triggered by maxfilesize) is invisible to readers — all keys remain retrievable across multiple .data filestester-bitcask-delete-nonexistent-safe — Calling delete() on a key that doesn't exist must not raise an exception