SQL vs NoSQL

SQL vs NoSQL

Decision framework for choosing between relational (SQL) and non-relational (NoSQL) storage based on access patterns, consistency requirements, and schema flexibility — not a blanket recommendation for either.

When NOT to Use NoSQL

  • Complex relationships requiring multi-table JOINs and complex integrity constraints — NoSQL databases do not enforce foreign keys; enforcing referential integrity in application code is fragile and error-prone at scale
  • ACID transactions spanning multiple entities/aggregates — most NoSQL databases do not provide cross-partition transactions at low latency; if your workload requires financial-grade consistency across entities, start with relational
  • When the team's strongest expertise is relational — operational maturity matters; a team unfamiliar with NoSQL consistency semantics (eventual consistency, tunable quorum, conflict resolution) will produce incorrect systems

Core Mechanism

ACID vs BASE

ACID (Atomicity, Consistency, Isolation, Durability): Transaction semantics. Either all operations in a transaction succeed or all are rolled back — the database never leaves partial state. Isolation prevents dirty reads between concurrent transactions. Durability guarantees persistence after commit, even on crash. Relational databases (PostgreSQL, MySQL, CockroachDB). Suitable for financial transactions, order management, inventory — any workload requiring invariant enforcement across multiple rows or tables.

BASE (Basically Available, Soft state, Eventually consistent): Availability over consistency. The system accepts writes even during network partitions; replicas converge over time without hard transaction boundaries. Soft state means data may change without a client write (as replicas sync). NoSQL databases (Cassandra, DynamoDB, MongoDB). Suitable for high-availability, write-heavy, geographically distributed workloads where temporary inconsistency is tolerable (social feeds, user preferences, analytics).

The ACID/BASE distinction maps directly to the CAP-Theorem consistency-availability tradeoff: during a network partition, ACID databases typically choose consistency (CP), while BASE databases typically choose availability (AP). Consistency level in many NoSQL systems (e.g., Cassandra) is tunable per operation — "AP" is the default, not an absolute label.

Data Model Comparison

DimensionRelational (SQL)DocumentKey-ValueColumn-familyGraph
Data modelRows + columns, normalizedJSON documents, nestedArbitrary value by keyRows with dynamic columns grouped by column familyNodes + edges
Query modelSQL: JOIN, aggregation, arbitrary predicatesDocument field queries; limited JOINKey lookup onlyRow key + column range scanGraph traversal (Cypher, Gremlin)
ConsistencyACID transactionsVaries by product (MongoDB: single-doc atomic; multi-doc transactions available)Eventual by default (tunable)Tunable (quorum)ACID for embedded graph (Neo4j)
Scale modelVertical + read replicasHorizontal (sharding)HorizontalHorizontal (wide-column, partition key)Vertical (large graphs require specialized partitioning)
ExamplesPostgreSQL, MySQL, CockroachDBMongoDB, Couchbase, FirestoreRedis, DynamoDB, RiakCassandra, HBase, ScyllaDBNeo4j, Amazon Neptune

Component Diagram

SQL-vs-NoSQL-diagram.excalidraw

Key Variants

Three axes drive the storage selection decision. Evaluate in order: access pattern first, consistency second, schema flexibility third.

Access pattern:

  • Key-value or document retrieval by known key → NoSQL key-value (DynamoDB, Redis) or document (MongoDB, Firestore)
  • Complex queries with multiple predicates and JOINs → relational (PostgreSQL, MySQL)
  • Full-text search with ranking and faceting → dedicated search engine (Elasticsearch, OpenSearch); not a primary storage choice but a query layer on top of a primary store
  • Graph traversal (friend-of-friend, recommendation engine, knowledge graph) → graph DB (Neo4j, Amazon Neptune)
  • Time-series (append-only, range scan by timestamp) → column-family (Cassandra, ScyllaDB) or dedicated TSDB (InfluxDB, TimescaleDB)

Consistency requirement:

  • Strict consistency required (financial transactions, inventory management, booking systems where double-booking must be impossible) → relational with ACID
  • Eventual consistency acceptable (social feeds, analytics, user preferences, cache-backed reads, activity logs) → NoSQL
  • Mixed: use relational for the write model (invariants enforced), NoSQL for the read model (fast retrieval) — see CQRS-Pattern

Schema flexibility:

  • Schema evolves rapidly (early-stage product, polymorphic entities, event payloads with varying fields) → document DB; no migration overhead, flexible document shape per record
  • Schema is stable and well-defined with known relationships and foreign key constraints → relational; schema enforcement is a feature, not a limitation
  • Schema-less key access with no query requirements beyond lookup → key-value; no schema definition required at all

Team and operational factors (secondary axis — apply as a tiebreaker):

  • Team expertise in SQL → prefer relational unless access pattern clearly mandates NoSQL
  • Managed cloud services preferred → DynamoDB (AWS), Firestore (GCP), Cosmos DB (Azure) remove operational overhead for NoSQL; Cloud SQL/RDS for relational
  • Multi-region writes with geo-latency constraint → multi-leader or leaderless NoSQL (Cassandra, DynamoDB global tables) or distributed SQL (CockroachDB, Spanner)

Design Decisions

Plain-text decision flowchart for primary storage selection:

What is your primary access pattern?
+-- Key lookup (get by ID, session, user profile)
|   +-- Key-Value: DynamoDB, Redis
+-- Document (hierarchical, nested data, moderate query complexity)
|   +-- Document: MongoDB, Firestore
+-- Wide query with JOINs and arbitrary predicates
|   +-- Relational: PostgreSQL, MySQL
+-- Time-series (append-only, range by time)
|   +-- Column-family: Cassandra, ScyllaDB -- or dedicated TSDB (InfluxDB)
+-- Graph traversal (social graph, recommendations)
    +-- Graph: Neo4j, Amazon Neptune

If access pattern -> NoSQL:
  Does your workload require strict ACID transactions?
  +-- Yes -> consider CockroachDB (SQL + horizontal scale) or relational
  +-- No -> NoSQL is appropriate

Note on CockroachDB: it is a distributed SQL database that bridges the SQL/NoSQL divide — provides PostgreSQL-compatible SQL and horizontal scaling via distributed transactions (using a consensus protocol per range of key-space). Use when you need both horizontal scale and ACID semantics, and are willing to accept the operational complexity and higher per-transaction latency of a distributed SQL cluster. Google Spanner is the managed cloud equivalent (Google Cloud).

CQRS architecture commonly uses this framework to select different storage per model: a relational write model (ACID, normalized, enforces all invariants on writes) and a denormalized NoSQL read model (fast retrieval, pre-joined for query patterns). The write path takes the consistency penalty; the read path takes the availability benefit. See CQRS-Pattern for the architectural separation and command/query handler design.

The Repository pattern is the implementation boundary at which this selection decision is enforced — domain logic above the repository is agnostic to whether the backing store is PostgreSQL or DynamoDB.

Pitfalls

Choosing NoSQL for ACID-required workloads

NoSQL databases (except some distributed SQL like CockroachDB) do not provide cross-partition ACID transactions at low latency. If your workload requires financial-grade consistency, start with relational. Bolting ACID onto an eventually consistent store via application code is fragile and error-prone. The cost of retrofitting consistency guarantees into a NoSQL system is higher than the cost of scaling a relational database.

Ignoring access pattern axis in decision

ACID vs BASE is the most commonly cited differentiator, but access pattern drives most real storage selection decisions. A system needing key-value lookups at scale should use a key-value store even if ACID is not required — not a relational database with single-column lookups. SQL joins on a 1TB table are not a performance solution regardless of consistency requirements. Evaluate access pattern first.

Existing Pattern Connections

  • CAP-Theorem — ACID vs BASE maps directly to the CAP consistency-availability tradeoff; relational databases are typically CP (consistency preserved during partition); NoSQL databases are typically AP (availability preserved, tunable consistency); CAP-Theorem provides the formal distributed systems framework behind this selection axis
  • CQRS-Pattern — CQRS read and write models often use different storage technologies; a common pattern is a relational write model (ACID, normalized, enforces invariants) and a denormalized NoSQL read model (fast key-based retrieval); SQL-vs-NoSQL selection is applied independently to each model
  • Repository — the Repository pattern abstracts storage technology from domain logic; the SQL-vs-NoSQL choice is made at the Repository implementation (infrastructure layer), invisible to domain logic above it; a Repository interface can be implemented against either relational or NoSQL storage without changing the domain model