CAP Theorem

CAP Theorem

A distributed system can guarantee at most two of Consistency, Availability, and Partition Tolerance — but since network partitions are inevitable, the practical choice during a partition is between consistency and availability.

When NOT to Use

  • CAP is not a design checklist — it is a reasoning framework for understanding consistency/availability tradeoffs during network partitions; using it as a binary classification ("choose CP or AP") oversimplifies the actual per-operation decision space
  • Do not use CAP to evaluate performance — CAP addresses behaviour during network partitions, not during normal operations; for latency/consistency tradeoffs during normal operations, use the PACELC framework: "Else (no partition), choose between Latency and Consistency"
  • Do not label systems as "CP" or "AP" as absolute properties — consistency level is often configurable per read/write operation; "Cassandra is AP" is only true at default consistency level — with QUORUM reads and writes, Cassandra behaves more like CP for those operations

Source note: This note follows the consistency model taxonomy from DDIA 2nd ed (Kleppmann & Riccomini, 2026), Ch. 9. The 1st ed (2017) taxonomy has been revised in the 2nd ed — the classification table in Key Variants reflects the 2nd ed framing. The original Gilbert/Lynch (2002) formal proof is referenced in Core Mechanism; the PACELC extension is from Abadi (2012).

Core Mechanism

The Theorem

Gilbert and Lynch (2002) formalised Brewer's 2000 conjecture. The theorem states:

A distributed system can guarantee at most two of:

  • Consistency (C): Every read receives the most recent write or an error; all nodes see the same data at the same time
  • Availability (A): Every request receives a response (not necessarily the most recent data); the system does not return errors to indicate unavailability
  • Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed between nodes

The key practical insight: In any real distributed system deployed over a network, partition tolerance is not optional — network partitions happen due to hardware failures, network splits, and packet loss. The real choice is between consistency and availability during a partition:

  • Choosing C over A during a partition: The system refuses to answer (or returns an error) rather than risk returning stale data. When the partition heals, consistency is restored. This is the CP behaviour.
  • Choosing A over C during a partition: The system continues to accept reads and writes, but different nodes may see different data until the partition heals and replicas converge. This is the AP behaviour.

The CA choice (Consistency + Availability, no Partition Tolerance) applies only to single-node systems or systems that assume a perfectly reliable network — not practical for distributed systems at scale.

Why CP vs AP matters in interviews: When an interviewer asks "how does your system handle a network partition?", the CAP theorem gives the vocabulary to answer precisely. A CP answer is: "writes will fail until quorum is restored; reads will return an error rather than stale data." An AP answer is: "reads and writes continue; nodes may see different data; when the partition heals, reconciliation runs." Both are valid design choices — the key is acknowledging the tradeoff explicitly rather than assuming the problem does not exist.

Brewer's Retraction and PACELC

Brewer himself acknowledged in 2012 that the "2 of 3" framing is misleading in several ways:

"2 of 3" is misleading because:

  • Systems are not statically CP or AP — they make per-operation tradeoffs depending on quorum settings, consistency levels configured per read/write, and whether a partition is actually occurring
  • The original framing treats availability as binary (available or not), but real systems have degrees of availability and graceful degradation
  • Partitions are rare in practice; optimising only for partition behaviour ignores the dominant performance constraint (latency vs consistency during normal operations)

PACELC extension (Abadi 2012): A more complete framework:

During Partition (P):    choose between Availability (A) and Consistency (C)
Else (normal operation): choose between Latency (L) and Consistency (C)

The "Else" dimension is critical for real system design: even without a partition, there is a tradeoff between lower-latency reads (which may read from a replica that is slightly behind) and higher-latency reads (which must read from the leader or reach quorum). This is the practical decision most distributed systems make constantly, not just during the rare partition event.

DDIA 2nd ed (Kleppmann & Riccomini, 2026) recommendation: Reason about consistency guarantees per operation rather than labelling systems as CP or AP. Different operations within the same system can have different consistency requirements: a user profile read might tolerate eventual consistency, while a payment write must be linearisable.

Component Diagram

CAP-Theorem-diagram.excalidraw

Key Variants

Consistency Models

Taxonomy from DDIA 2nd ed, Ch. 9. Listed in descending order of strength (strongest to weakest guarantee):

ModelGuaranteeExamples
Linearizability (strong)All reads reflect the most recent write; operations appear instantaneous — the strongest possible guaranteeZookeeper (reads via leader), etcd, Google Spanner
Sequential consistencyAll operations appear in a single total order across all processes; may lag real time by bounded amountSome configurations of Cassandra with quorum
Causal consistencyOperations causally related maintain order; concurrent operations may differ across nodesMongoDB causal sessions, Cosmos DB session consistency
Eventual consistencyReplicas converge given no new writes; no ordering guarantee between concurrent operationsCassandra (default), DynamoDB (default), DNS

Why this table matters: Choosing a consistency model is a first-class system design decision. Linearizability costs latency (requires coordination on every read/write). Eventual consistency enables low-latency reads from any replica but requires the application to handle stale reads. Causal consistency is a practical middle ground for social and collaborative features where "my writes are visible to me" is required but total ordering is not.

Real-System Classification

Classification is per-default-configuration; most databases offer tunable consistency. The "Why" column explains the mechanism.

SystemClassificationWhy
ZookeeperCPRefuses writes during partition; strong consistency for leader election; readers directed to leader
etcdCPRaft consensus; halts writes if quorum lost; designed for configuration coordination
MySQL (single primary)CA (no P)Assumes reliable network; not designed for partition recovery; a partition causes the replica to lag with no automatic resolution
CassandraAP (tunable)Continues accepting writes during partition; eventual consistency default with configurable consistency levels (ONE, QUORUM, ALL)
DynamoDBAP (tunable)Eventual consistency default; strong consistency available at higher latency/cost via ConsistentRead=true
CockroachDBCPSerializable isolation; pauses during partition; built on Raft consensus
MongoDBCP (default)Primary election via Raft; reads from primary enforce strong consistency; replica reads are eventually consistent

Requirement-to-choice decision table:

Requirement                             -> Choose
Reads must always return latest write   -> Linearizability (CP); accept unavailability during partition
System must stay available under split  -> Eventual consistency (AP); accept stale reads
Causally related ops must stay ordered  -> Causal consistency; cross-service chat, comment threads
Single-node app; no distribution needed -> N/A; no CAP tradeoff applies

PACELC Classification of Real Systems

Extending the classification table above with the Else (latency/consistency) dimension. This is the more useful column for everyday system design decisions because normal-operation latency tradeoffs occur far more frequently than network partitions.

SystemP: PartitionEL: Else
ZookeeperCP — refuses writes during partitionEC — consistent reads from leader; higher latency than replica reads
etcdCP — halts writes if quorum lostEC — Raft leader reads enforce consistency at latency cost
Cassandra (QUORUM)CP — quorum required for reads/writesEC — quorum is slower than eventual reads; tunable per-operation
Cassandra (default)AP — accepts writes during partitionEL — low-latency reads from any replica; may be stale
DynamoDB (default)AP — available during partitionEL — eventual consistency reads are lower latency and cheaper
DynamoDB (ConsistentRead=true)CPEC — consistent reads cost twice the read capacity
Google SpannerCP — TrueTime; refuses when clock bounds too largeEC — linearisable reads at global-sync latency

Reading the table: A system like DynamoDB has two PACELC profiles depending on operation parameters. This is why static "DynamoDB is AP" labelling is imprecise — the correct statement is "DynamoDB defaults to AP/EL but supports CP/EC on a per-request basis."

Practical Application: Classifying Your Own System

The Gilbert/Lynch theorem and PACELC framework are most useful when applied to concrete design decisions. Ask three questions:

  1. What is the unit of consistency? Row-level (most databases), record-level, session-level, or global linearizability? A weaker consistency unit enables higher availability.
  2. Which operations must be consistent? Payment debits must be linearisable. Profile photo updates can be eventual. Identify the consistency requirement per operation before choosing a database.
  3. What is the partition recovery behaviour? When a partition heals, how does the system reconcile divergent writes? Last-write-wins (LWW), merge functions (CRDTs), or manual conflict resolution? The reconciliation strategy is the implicit contract that the AP choice creates.

Pitfalls

Static CP/AP labeling

Do not label a system as "CP" or "AP" without qualification. Most distributed databases (Cassandra, DynamoDB, MongoDB) offer tunable consistency per operation. "Cassandra is AP" is only true at default consistency level — with QUORUM reads and writes it behaves more like CP for those operations. Always state which consistency level a classification assumes; state whether the classification applies during partitions or normal operation.

Confusing CAP with performance

CAP addresses behavior during network partitions, not during normal operations. For latency/consistency tradeoffs during normal operations, use the PACELC framework: "Else (no partition), choose between Latency and Consistency." A system can be CP (consistent under partition) and still offer low-latency reads by serving from a local replica during normal operation; these are different dimensions.

Two additional pitfalls:

  • Treating the theorem as a database selection criterion alone: CAP applies to any distributed system, not just databases. A microservices architecture that replicates data across services faces the same CP/AP tradeoff. When Service A and Service B each hold a copy of a customer record (for autonomy), the system is making an AP choice — eventual consistency between services is the operational contract.
  • Ignoring the CP cost in interviews: Candidates often say "I'll use Zookeeper/etcd for strong consistency" without acknowledging the cost: the system will be unavailable (returning errors) during a partition. If the interviewer asks "what happens when the Zookeeper quorum is split?", the answer must include "writes fail until quorum is restored." Availability guarantees must be stated alongside consistency guarantees.

Existing Pattern Connections

  • CQRS-Pattern — CQRS read model accepts eventual consistency (AP choice at the read model layer); the command side may require strong consistency (CP) for writes that must be linearisable; this is the PACELC "Else" dimension applied architecturally: write commands enforce consistency at higher latency, read queries accept eventual consistency at lower latency
  • Event-Sourcing-Pattern — event log is append-only; projections are eventually consistent (AP for reads), making Event Sourcing a natural fit for AP-tolerant architectures; the event store itself may be CP (strong consistency for the log) while projections rebuilt from the log are eventually consistent
  • Database-Replication — replication lag is the practical manifestation of the AP tradeoff: a replica that is 500ms behind the primary is "eventually consistent" in practice; understanding CAP explains why replication lag exists and when it matters; the primary/replica topology is itself an architectural CAP choice (forward link to Phase 29 note)
  • Database-Sharding — consistency during resharding is CAP-theorem territory; shard rebalancing temporarily violates availability or consistency as data moves between nodes; the choice between stopping writes during rebalance (CP behaviour) and accepting potential stale reads (AP behaviour) is a direct CAP decision (forward link to Phase 29 note)

Key insight from the connections above: CAP is not a one-time database selection decision — it is an ongoing operational contract that appears at the replication layer (Database-Replication), the partitioning layer (Database-Sharding), the command/query separation layer (CQRS), and the event propagation layer (Event-Sourcing). Every system design that involves data replication — which is most systems at scale — is implicitly making a CAP decision. Making that decision explicit, per operation, with the PACELC framework, is the engineering discipline the theorem demands.