Database Replication

Maintaining copies of the same data on multiple database nodes to improve read throughput, provide fault tolerance, and reduce read latency by serving reads from geographically closer replicas.

When NOT to Use

Write throughput exceeds what a single primary can handle — replication scales reads, not writes; use Database-Sharding to scale writes. A single primary is the write bottleneck regardless of how many replicas exist. Replication is the wrong tool when the constraint is write capacity.
Writes require cross-region low latency — single-leader from a remote region has unacceptable write latency for users in that region; consider multi-leader or geo-partitioning (sharding by region). Replication does not solve write latency for geographically distributed users who need to write locally.

Core Mechanism

Replication keeps multiple copies of the same data on different nodes. The defining question is: which nodes are allowed to accept writes?

Single-Leader (Primary-Replica)

All writes go to one primary; replicas receive changes via replication log. Simple mental model. Write throughput limited by single primary. Read throughput scales horizontally via replicas — adding replicas adds read capacity linearly. Most common model:

PostgreSQL streaming replication — physical WAL (Write-Ahead Log) streaming
MySQL binlog replication — logical binlog events
MongoDB replica sets — primary election with Raft-based consensus

The primary maintains a replication log; replicas replay it to stay current. The lag between primary write and replica application is replication lag.

Multi-Leader (Multi-Master)

Multiple nodes accept writes simultaneously; changes propagate between leaders via asynchronous replication. Enables geo-distributed writes — each region has a local leader, so writes do not cross region boundaries. Requires write conflict resolution because two leaders can accept conflicting writes for the same key:

Last-write-wins (LWW): each write is timestamped; the write with the later timestamp wins. Simple but risks discarding recent writes on nodes with skewed clocks.
Application-specific merge: application defines a merge function for conflicting writes (e.g., union of sets for collaborative editing).
CRDTs (Conflict-free Replicated Data Types): data structures that commute — concurrent writes always merge to a deterministic result without conflict.

Complexity is high; use only when single-leader write latency from remote regions is unacceptable.

Leaderless (Dynamo-Style)

Any replica accepts reads and writes; coordinator routes to N replicas using quorum (W writes + R reads, W+R > N for strong consistency). No single point of failure — no primary election is required on failure; the coordinator simply routes to healthy replicas.

Quorum formula: With N total replicas, W write acknowledgements, R read replicas:

W + R > N → at least one replica overlaps between write and read set → strong consistency
Typical defaults: N=3, W=2, R=2 (strong); N=3, W=1, R=1 (low latency, eventual)

Products: Cassandra, Riak, DynamoDB

Conflict resolution:

Last-write-wins (timestamp) — risk of discarding writes on nodes with clock skew
Read-repair: coordinator detects stale replica during a read, writes back the latest value
Anti-entropy: background process continuously compares replicas and repairs divergence

Component Diagram

Database-Replication-diagram.excalidraw

Key Variants

Replication models differ by which nodes accept writes. Replication methods differ by when the primary considers a write acknowledged.

Method	Mechanism	Consistency	Write latency	Use when
Synchronous	Primary waits for replica acknowledgement before acknowledging client	Strong	Higher	Replica is a hot standby; zero data loss required
Asynchronous	Primary acknowledges client immediately; replicas receive changes eventually	Eventual	Lower	Default for most read-scaling; acceptable replica lag
Semi-synchronous	One replica is synchronous; others are async	One strong replica; others eventual	Medium	Balance between durability and latency (MySQL default)

Key insight: Synchronous replication trades write latency for durability — if the synchronous replica is unavailable, the primary must wait or refuse writes. Asynchronous replication achieves lower write latency but accepts a data loss window equal to the replication lag if the primary fails.

Design Decisions

Replication Lag Consequences

Async replication introduces lag between the primary and its replicas. Two consistency anomalies emerge when reads are served from replicas:

Read-your-writes violation: user writes a value then reads a stale replica — perceives their write was lost. Common in social features: user posts an update, refreshes page, sees stale content.
Monotonic reads violation: reading from different replicas at different lag states — perceives data moving backward in time. User queries a value, refreshes, and gets an older value because the second read hit a more-lagged replica.

Lag Mitigations

Violation	Mitigation	Tradeoff
Read-your-writes	Route each user's reads to the same replica (sticky session)	Uneven replica load if user access is skewed
Read-your-writes	Route writes and immediately-subsequent reads to the primary	Increases primary read load
Both	Use timestamp-based reads (read replica only after it has caught up past the write timestamp)	Read may stall waiting for replica to catch up
Monotonic reads	Consistent replica assignment per user session	Requires session-to-replica mapping state

Failure Modes

Split-brain risk during automatic failover

Two nodes simultaneously believe they are primary; both accept writes; data diverges. Prevention: fencing (STONITH — Shoot The Other Node In The Head); quorum-based election (only elect if majority of replicas agree). Automatic failover (Patroni, MHA) is faster but carries split-brain risk if network partition makes primary appear dead while still running. Manual failover is slower but reduces split-brain risk.

Data loss window with asynchronous replication

If the primary fails before un-replicated writes propagate to any replica, those writes are lost. The window size equals the replication lag. For zero data loss, use synchronous replication to at least one replica — at the cost of higher write latency.

Failover strategies:

Manual failover: operations team promotes replica to primary after verifying replica lag is acceptable. Slower to execute; reduces risk of split-brain. Requires on-call human in the recovery path.
Automatic failover: orchestrator (Patroni, MHA) detects primary failure via heartbeat; elects a new primary from the replica set. Faster recovery (seconds vs minutes); split-brain risk requires fencing or quorum-based election to mitigate. Typical production choice for high-availability systems.

Failover edge cases:

Replica with stale data is promoted — lost writes from un-replicated lag become the new baseline
New primary is announced, old primary recovers and rejoins — two nodes believe they are primary simultaneously
Network partition heals, old primary sees the new primary — requires fencing to prevent old primary from continuing to serve writes

Existing Pattern Connections

CAP-Theorem — asynchronous replication is an AP choice (availability over consistency); synchronous replication is a CP choice (consistency at the cost of write availability if replica is unreachable)
Consistent-Hashing — leaderless replication uses a hash ring to determine which N replicas own each key; the ring mechanics directly apply
CQRS-Pattern — database replication (read replicas) is an infrastructure-layer implementation of CQRS read/write separation at the storage tier; CQRS provides the application-level pattern on top

Database Replication

Database Replication

When NOT to Use

Core Mechanism

Single-Leader (Primary-Replica)

Multi-Leader (Multi-Master)

Leaderless (Dynamo-Style)

Component Diagram

Key Variants

Design Decisions

Replication Lag Consequences

Lag Mitigations

Failure Modes

Existing Pattern Connections

Backlinks

Linked mentions

Database Replication

Tags

Database Replication

When NOT to Use

Core Mechanism

Single-Leader (Primary-Replica)

Multi-Leader (Multi-Master)

Leaderless (Dynamo-Style)

Component Diagram

Key Variants

Design Decisions

Replication Lag Consequences

Lag Mitigations

Failure Modes

Existing Pattern Connections

Backlinks

Linked mentions