Date: 2026-05-29
Time: 11:27
I have enough from the observations to give a thorough explanation. Let me work with what's provided.
unbundled-database/unbundled_database.py:StorageEngine.rebuild — Demonstrates that the storage engine is truly derived data: it can be destroyed and reconstructed entirely from the WAL, proving the log is the source of truthstream-join-processor/streamjoinprocessor.py — The expiration/watermark logic (lines 155-200) implements the hard part of stream processing: deciding when an unmatched event will *never* match and should emit a missevent-sourcing-store/eventstore.py:Projection.catchup — Compare this with DerivedSystem.process_event in the unbundled module to see how event sourcing and CDC converge on the same catch-up-from-position patternflush-semantics-across-modules — The unbundled database requires explicit db.flush() to push CDC events to derived systems (visible in every test), while the event store uses synchronous subscriber notification — explore the tradeoffs between push and pull deliverychange-data-capture/cdc.py — The CDCLog.compact() method (line 69) implements log compaction; compare it with Kafka's compaction semantics and consider what guarantees it provides to new consumers bootstrapping from the compacted log---
wal-is-source-of-truth — The StorageEngine is fully derivable from the WriteAheadLog; calling rebuild() replays the entire WAL and reproduces identical state, making the log the authoritative recordcdc-old-value-required-for-consistency — SecondaryIndex.processevent depends on CDCEvent.oldvalue to remove stale index entries during updates and deletes; without before-images, incremental index maintenance would produce phantom referencesderived-systems-are-position-tracked — Every DerivedSystem tracks a position (LSN) representing how far it has consumed the CDC stream, enabling independent catch-up without coordinator statederived-systems-rebuildable-from-log — All derived systems implement rebuild(events) which clears internal state and replays from an event list, allowing new consumers to be added at any time without schema migration or backfill jobsunbundled-db-write-path-decoupled-from-reads — Adding a new DerivedSystem via addderivedsystem() requires zero changes to the write path (put/delete); the CDC stream is the only integration point between writers and readers