Date: 2026-05-29
Time: 13:22
The implementation in leader-election/leaderelection.py follows the classic Bully Algorithm. The core insight is visible in startelection (line ~155): when a node initiates an election, it sends ELECTION messages to every node with a higher ID:
higher_nodes = [nid for nid in self.all_node_ids if nid > self.node_id]
Each of those higher-ID nodes responds with an ALIVE message (line ~72) and then starts its *own* election by messaging *its* higher-ID peers. This creates a cascade of elections that converges on the highest available node.
The winner then broadcasts a COORDINATOR message to all other nodes (declare_victory, line ~172):
for nid in self.all_node_ids:
if nid != self.node_id:
messages.append(Message("COORDINATOR", self.node_id, nid, ...))
Consider the worst case: the lowest-ID node detects a leader failure and starts an election in a 5-node cluster.
1. Node 1 sends ELECTION to nodes 2, 3, 4, 5 → 4 messages
2. Nodes 2, 3, 4, 5 each send ALIVE back to node 1 → 4 messages
3. Node 2 starts its own election, sends to 3, 4, 5 → 3 messages
4. Nodes 3, 4, 5 each send ALIVE back → 3 messages
5. Node 3 sends to 4, 5 → 2 messages, gets 2 ALIVEs back
6. Node 4 sends to 5 → 1 message, gets 1 ALIVE back
7. Node 5 wins, sends COORDINATOR to all 4 → 4 messages
That's roughly n + (n-1) + (n-2) + ... + 1 ELECTION messages, the same number of ALIVE responses, plus n-1 COORDINATOR messages. The ELECTION+ALIVE cascade alone is O(n²). You can see this cascade in the code: receivemessage at line ~72 both sends ALIVE *and* calls self.startelection(), propagating the wave upward.
The Bully Algorithm's advantage is conceptual directness. Look at what each node needs to know:
There's no need to maintain a logical ring topology, no token to pass, no worry about ring breaks. The state machine in receivemessage (line ~65) handles just four message types: ELECTION, ALIVE, COORDINATOR, HEARTBEAT. Recovery is automatic — testrecoverednodetakes_over (test file line ~34) shows that a recovered high-ID node simply bullies its way back to leadership by winning the next election.
A ring-based algorithm (like Chang-Roberts) would work differently:
| Property | Bully (this impl) | Ring-based (Chang-Roberts) |
|---|---|---|
| Topology | Fully connected — every node can message any other (line ~155, messages sent to arbitrary nid) | Logical ring — each node only messages its successor |
| Messages per election (worst) | O(n²) — cascade of ELECTION + ALIVE pairs | O(n) — election message travels the ring once |
| Messages per election (best) | O(n) — highest node detects failure, broadcasts COORDINATOR directly | O(n) — still must traverse the full ring |
| Failure handling | Simple: unavailable nodes just don't respond (if not self._available: return [], line ~66) | Complex: ring must be repaired if a node fails mid-election |
| Recovery | Automatic: recovered node wins next election (test line ~35) | Requires ring re-insertion protocol |
| Leader selection | Always highest available ID (line ~279: max(leaders)) | Can elect highest ID, but ring traversal required |
The critical structural difference: this implementation assumes full connectivity — startelection iterates over self.allnodeids and can send to any of them. A ring algorithm would restrict communication to self.nextnode only, cutting per-node fan-out from O(n) to O(1), but requiring the ring to be maintained.
For small-to-moderate clusters (the 5-node clusters used throughout the tests), O(n²) messages is perfectly acceptable — 25 messages vs. 5 isn't meaningful. The simplicity payoff is real: the entire implementation is ~200 lines with a clean state machine. A ring-based algorithm would need additional complexity for ring construction, ring repair on failure, and handling the case where the election token is lost because a node failed while holding it.
The Bully Algorithm becomes costly in large clusters (hundreds of nodes), where the quadratic message overhead dominates. That's where ring-based or tree-based algorithms earn their keep — but at the cost of maintaining the overlay topology.
leader-election/leader_election.py:tick — The timer-driven state machine that triggers elections on heartbeat timeout and leader heartbeat broadcasts; this is where liveness detection livesleader-election/leaderelection.py:receivemessage — The cascade mechanism where receiving an ELECTION triggers a new election, the key source of O(n²) behaviorchang-roberts-ring-election — Implementing Chang-Roberts alongside this Bully implementation would make the complexity tradeoff concrete and testableleader-election/testleaderelection.py — The test suite covers failure, recovery, and multi-failure scenarios; worth studying to understand the edge cases the protocol handlesraft-vs-bully-comparison — Raft (used in etcd, Consul) uses randomized timeouts instead of ID ordering; comparing it with this Bully implementation would show a third point in the election algorithm design spacebully-sends-election-to-higher-only — start_election sends ELECTION messages exclusively to nodes with IDs higher than the sender, not to all peersbully-cascade-causes-quadratic-messages — Receiving an ELECTION from a lower-ID node triggers both an ALIVE response and a new start_election call, creating the O(n²) message cascadebully-coordinator-is-broadcast — declare_victory sends a COORDINATOR message to every other node in the cluster (O(n) messages), not just to nodes that participated in the electionbully-assumes-full-connectivity — The algorithm requires any node to message any other node directly; there is no ring or tree overlay topologybully-highest-available-always-wins — The algorithm guarantees the highest available node ID becomes leader, enforced by the ALIVE suppression mechanism and confirmed by all recovery tests