File: hash-index-storage/testertestbitcask.py

Date: 2026-05-29

Time: 06:56

hash-index-storage/testertestbitcask.py

Purpose

This is the independent verification test suite for the Bitcask hash-index storage engine. The tester prefix distinguishes it from the implementation's own testbitcask.py — this file was written by a separate "tester" agent to validate the implementation against its spec, providing a second pair of eyes. It exercises BitcaskStore end-to-end: CRUD, file rotation, compaction, hint-file recovery, crash recovery, scale, and edge cases.

The file is designed to run both as a pytest module and standalone via _main_, printing PASS: <name> per test for quick visual confirmation.

Key Components

Test Functions

| Function | What it validates |

|---|---|

| testbasiccrud | Full lifecycle: put, get, delete, keys, len, _contains_ on a fresh store |

| test_overwrite | Multiple writes to the same key return the latest value; len stays 1 |

| testfilerotation | When maxfilesize=256 is exceeded, multiple .data files appear on disk; all keys remain readable |

| test_compaction | After 100 overwrites + a delete, compact() preserves only live data and removes tombstones |

| testhintfilesandstartup | compact() produces .hint files; a new BitcaskStore instance on the same directory rebuilds correctly from hints |

| teststartuprecoverynohints | Manually deletes .hint files, then reopens — forces full data-file scan recovery and verifies correctness |

| testlargedataset | 10k keys with 5k overwrites; spot-checks both updated and untouched keys |

| testedgecases | Empty-store operations, delete of nonexistent key (no crash), put-delete-put cycle |

| testexamplefrom_spec | Runs the exact scenario from the task specification as a golden-path regression |

Shared Setup Pattern

Every test follows the same structure:

1. Create a tempfile.TemporaryDirectory

2. Instantiate BitcaskStore(dir, sync_writes=False, ...)

3. Exercise operations and assert

4. Explicitly call s.close()

5. Print PASS: <name>

sync_writes=False is used everywhere to avoid fsync overhead in tests — the tests care about logical correctness, not durability guarantees.

Patterns

Dependencies

Imports:

Imported by: Nothing imports this file. It's a leaf test module.

Flow

When run standalone (python testertestbitcask.py), tests execute sequentially in declaration order. Each test is self-contained: create dir → open store → operate → assert → close → print. No shared state between tests.

The two recovery tests (testhintfilesandstartup, teststartuprecoverynohints) have a two-phase flow: write data with one instance, close it, then open a second instance and verify the data survived the restart. This tests the startup index-rebuild path.

Invariants

Error Handling

There is no explicit error handling — tests rely on assertions and will raise AssertionError on failure. The design assumes BitcaskStore doesn't raise on delete of a nonexistent key (verified in testedgecases). The with tempfile.TemporaryDirectory() blocks ensure cleanup even if a test fails mid-execution.

Topics to Explore

Beliefs