Date: 2026-05-29
Time: 07:02
LSMTree._flush — Flush Memtable to SSTableflush persists the current in-memory sorted buffer (the memtable) to disk as a new SSTable file. This is the core mechanism that converts volatile writes into durable storage. It exists because the memtable is bounded in size — once it hits memtablethreshold entries, the data must move to disk to keep memory usage bounded while preserving all writes.
In the LSM tree architecture from DDIA Chapter 3, this is the transition from the "in-memory balanced tree" layer to the "on-disk sorted segment" layer.
Preconditions:
self.dir exists on the filesystem (guaranteed by init_ calling os.makedirs)self._wal is open and writableself._seq is monotonically increasing (no two SSTables share a sequence number)Postconditions:
self._memtable is a fresh, empty SortedDict.sst file exists on disk containing all entries from the old memtableSSTable is appended to self._sstables (newest-last ordering preserved)compactionthreshold, compaction runs immediatelyInvariant: After _flush, the union of memtable + immutable memtables + SSTables still contains every non-compacted write. No data is lost.
None — this is a private method operating entirely on instance state.
None. The caller (put, delete, close) does not check a return value. Success is implicit; failure would raise an exception from I/O operations.
1. Guard clause: If the memtable is empty, return immediately. This prevents creating zero-entry SSTable files (which would waste disk and confuse compaction).
2. Freeze the memtable: Save a reference to the current SortedDict in frozen, then replace self._memtable with a new empty SortedDict. This is a pointer swap — frozen and the new memtable are independent objects. New writes arriving after this point go into the fresh memtable.
3. Allocate a sequence number: nextseq() returns the current self._seq and increments it. This gives the SSTable a unique, monotonically increasing identity used for both the filename and merge ordering during reads/compaction.
4. Construct the file path: Format as sst000042.sst (zero-padded to 6 digits). The zero-padding ensures lexicographic sort of filenames matches numeric sort — important for loadexistingsstables which uses sorted() on filenames at recovery time.
5. Write the SSTable: SSTable.write() serializes all entries in sorted order (guaranteed by SortedDict) into a binary file with a sparse index footer, and returns the in-memory SSTable handle with the sparse index already populated.
6. Register the SSTable: Append to self._sstables. Since SSTables are appended in creation order and sequence numbers are monotonic, the newest-last invariant is maintained.
7. Truncate the WAL: The WAL's purpose is crash recovery for unflushed memtable data. Once the data is persisted to an SSTable, the WAL entries are redundant and can be discarded. WAL.truncate() opens the file in write mode (zeroing it) then reopens in append mode.
8. Trigger compaction if needed: If the total SSTable count meets or exceeds compactionthreshold, call self.compact() to merge all SSTables into one, eliminating duplicates and tombstones. This is size-triggered compaction (as opposed to tiered or leveled strategies).
| Side Effect | Details |
|---|---|
| Disk I/O | Creates a new .sst file via SSTable.write |
| WAL truncation | Zeros the WAL file — data only in the WAL before flush is now only in the SSTable |
| State mutation | Replaces memtable, increments seq, appends to _sstables |
| Cascading compaction | May trigger compact(), which deletes old SSTable files and creates a merged one |
There is no explicit error handling. If SSTable.write or the filesystem operations fail (disk full, permission denied), the exception propagates to the caller (put, delete, or close). This leaves the system in a partially-flushed state:
frozen would be lost if the write fails partway through.This is a known simplification: a production LSM tree would use immutable memtables (note immutablememtables exists but _flush doesn't use it) and atomic file rename to handle this window.
_flush is called from three sites:
put() — after inserting a key, if len(self.memtable) >= self.thresholddelete() — same threshold check after writing a tombstoneclose() — unconditionally, to persist any remaining memtable entries before shutdownIt is a private method (underscore prefix). External callers are expected to use put/delete/close and let flush happen automatically.
| Dependency | Role |
|---|---|
| SortedDict (sortedcontainers) | Provides the sorted memtable; list(frozen.items()) yields entries in key order |
| SSTable.write | Serializes entries to the binary SSTable format with sparse index |
| WAL.truncate | Clears the write-ahead log after successful persistence |
| os.path.join | Constructs the SSTable file path |
| self.compact | Called conditionally to merge SSTables |
1. Single-threaded access. The memtable swap (frozen = self.memtable; self.memtable = SortedDict()) is not atomic. Concurrent put() calls during flush would race on the memtable reference.
2. immutablememtables is unused during flush. The frozen memtable is written directly to disk rather than being staged as an immutable memtable first. This means reads during flush could miss keys that are between the old memtable (now frozen, not checked by get) and the new SSTable (not yet written). A production implementation would append frozen to immutablememtables before writing.
3. WAL covers exactly one memtable. Truncating the WAL assumes all WAL entries correspond to the flushed memtable. If a crash occurs after the memtable swap but before WAL truncation, replay would re-insert entries into a fresh memtable — correct behavior.
4. SSTable.write is synchronous and infallible. No retry or partial-write recovery.
flush-creates-sequential-sstables — Each flush call creates an SSTable file with a strictly increasing sequence number, preserving newest-last ordering in self.sstableswal-truncated-after-sstable-write — The WAL is truncated only after the SSTable is fully written and appended to the in-memory list, ensuring crash recovery can replay unflushed dataflush-skips-immutable-memtable-stage — flush writes directly to disk without staging the frozen memtable in immutable_memtables, creating a brief window where in-flight keys are invisible to get()compaction-triggered-by-sstable-count — Compaction runs automatically when len(self.sstables) >= self.compaction_threshold, which defaults to 4