maxfilesize affects recovery time (more segments = faster truncation but more files to scan on replay) and write amplificationDate: 2026-05-29
Time: 08:27
I have enough from the observations to give a thorough explanation. Let me work with what's already gathered.
write-ahead-log/wal.py:replay — How WAL replay filters records by after_seq and how checkpoint records interact with recoveryhash-index-storage/bitcask.py:compact — The full compaction loop including how it handles tombstones and produces hint files, directly affecting write amplificationlog-structured-hash-table/bitcask.py:compact — Compare the log-structured variant's compaction with the hash-index version — different merge strategies, same segment-size pressureslog-structured-merge-tree/lsm.py — How LSM trees take the segment-size tradeoff further with tiered compaction and a compaction_threshold parameter (line 204)hint-file-effectiveness — Measure how much hint files actually reduce recovery time by comparing loadhintfile vs scandatafile on realistic data sizes---
wal-default-segment-10mb — WAL and hash-index Bitcask both default maxfilesize to 10 MB (wal.py:65, bitcask.py:29), while the log-structured Bitcask defaults to 1 MB (log-structured-hash-table/bitcask.py:37)recovery-scans-all-segments — All three implementations scan every segment file on startup to rebuild state (wal.py:85-96, bitcask.py:109-114, log-structured-hash-table/bitcask.py:64-83), making recovery time proportional to segment countcompaction-respects-size-limit — Bitcask compaction output is itself subject to maxfilesize rotation (hash-index-storage/bitcask.py:259), so smaller limits produce more post-compaction filesauto-compact-triggers-on-frozen-count — The log-structured Bitcask triggers compaction when frozen segment count exceeds autocompactthreshold (log-structured-hash-table/bitcask.py:160-162), directly coupling segment size to compaction frequencyhint-files-skip-value-scanning — Both Bitcask variants use hint files to bypass full data-file scans during index rebuild, but the number of hint files still scales with segment count