Scan: ddia-implementations

Date: 2026-05-28

Time: 18:05

I don't have read access to the target repo, but the structure is highly informative. Here's my analysis based on the directory layout, file naming conventions, and deep knowledge of the DDIA concepts involved.

Module Map

The ~35 modules group naturally into the same layers as the book:

Storage Engines & Data Structures (Part I: Foundations)

| Module | DDIA Chapter |

|--------|-------------|

| hash-index-storage / log-structured-hash-table | Ch 3 — Bitcask-style hash index |

| sstable-and-compaction | Ch 3 — Sorted String Tables |

| log-structured-merge-tree | Ch 3 — LSM-Tree |

| b-tree-storage-engine | Ch 3 — B-Tree |

| write-ahead-log | Ch 3 — WAL for crash recovery |

| bloom-filter | Ch 3 — Probabilistic membership |

| avro-serializer | Ch 4 — Schema evolution |

Replication & Consensus (Part II: Distributed Data)

| Module | DDIA Chapter |

|--------|-------------|

| leader-follower-replication | Ch 5 — Single-leader |

| multi-leader-replication | Ch 5 — Multi-leader |

| leaderless-replication | Ch 5 — Dynamo-style |

| read-repair | Ch 5 — Anti-entropy |

| hinted-handoff | Ch 5 — Availability during partitions |

| leader-election | Ch 8 — Leader election |

| raft-consensus | Ch 9 — Raft |

| two-phase-commit | Ch 9 — 2PC |

| total-order-broadcast | Ch 9 — Total order broadcast |

Partitioning (Part II)

| Module | DDIA Chapter |

|--------|-------------|

| consistent-hashing | Ch 6 — Hash partitioning |

| range-partitioning | Ch 6 — Range partitioning |

| secondary-index-partitioning | Ch 6 — Partitioned indexes |

Transactions & Isolation (Part II)

| Module | DDIA Chapter |

|--------|-------------|

| snapshot-isolation | Ch 7 — MVCC |

| write-skew-detection | Ch 7 — SSI |

| fencing-tokens | Ch 8 — Fencing for distributed locks |

Clocks & Ordering (Part II)

| Module | DDIA Chapter |

|--------|-------------|

| lamport-clocks | Ch 8 — Logical clocks |

| vector-clocks | Ch 8 — Vector clocks |

Fault Tolerance (Part II)

| Module | DDIA Chapter |

|--------|-------------|

| gossip-protocol | Ch 5/8 — Failure detection |

| byzantine-fault-tolerance | Ch 8 — PBFT |

| merkle-tree | Ch 5 — Anti-entropy verification |

Stream & Batch Processing (Part III: Derived Data)

| Module | DDIA Chapter |

|--------|-------------|

| mapreduce-framework | Ch 10 — MapReduce |

| batch-word-count | Ch 10 — Batch processing |

| map-side-join | Ch 10 — Join strategies |

| change-data-capture | Ch 11 — CDC |

| event-sourcing-store | Ch 11 — Event sourcing |

| stream-join-processor | Ch 11 — Stream joins |

| partitioned-log | Ch 11 — Log-based messaging |

Integration (Part III)

| Module | DDIA Chapter |

|--------|-------------|

| unbundled-database | Ch 12 — Composing data systems |

| conflict-free-replicated-data-types | Ch 5/12 — CRDTs |

---

Critical Files & Exploration Strategy

The exploration should follow the book's pedagogical arc: storage foundations first, then distribution, then derived data. Within each layer, start with the simpler concept and build toward the complex ones.

Topics to Explore

Layer 1: Storage Foundations (understand the data structures everything else builds on)

Layer 2: Replication & Consensus (the hardest concepts; the core of distributed systems)

Layer 3: Transactions & Clocks (correctness guarantees)

Layer 4: Partitioning & Fault Tolerance (scaling out)

Layer 5: Stream & Batch (derived data systems)

Cross-cutting

---

Recommended Exploration Order

1. WAL then Bitcask — establish the log-append primitive

2. LSM then B-Tree — the two storage engine families

3. Leader-follower then Leaderless then Raft — replication spectrum

4. 2PC then MVCC then Vector clocks — correctness mechanisms

5. Consistent hashing then CRDTs — partitioning and conflict resolution

6. CDC then Unbundled database — integration and capstone

7. Tester pattern — cross-cutting infrastructure understanding

This order mirrors the book's dependency chain: each concept builds on the ones before it, so beliefs extracted early will serve as antecedents for derived beliefs later.