Chat System Design

System that enables real-time messaging between users via persistent WebSocket connections, with message ordering guarantees, presence tracking, and offline delivery.

Clarify First

Before designing, lock these assumptions with the interviewer:

1:1 chat only or group chat support? — Group chat introduces fan-out amplification; groups with 1K+ members require the same hybrid push/pull strategy as the news feed celebrity problem — see News-Feed-Design for the parallel.
Message types? — Text only vs text + images + files + voice; media handling requires a separate blob storage path and CDN; scope to text-only initially and note extensions.
Presence required? — Online/offline/last-seen indicator requires a heartbeat service and pub-sub fan-out for status updates; confirm if this is in scope.
Message retention? — 30 days vs indefinite; storage sizing diverges substantially; determine before committing to a storage tier.
End-to-end encryption? — Signal protocol (keys held by clients; server cannot read) vs server-side at-rest encryption; E2E changes the key management architecture significantly.

Capacity Estimation

Derivation chain for a large-scale messaging platform (2026):

Assumption: 500M DAU (WhatsApp scale, 2026 estimate)

Concurrent connections:
  peak_concurrent_users = 500M x 0.1 (10% online simultaneously) = 50M connections
  connections_per_server = 50,000 (WebSocket server capacity estimate)
  chat_servers_needed = 50M / 50,000 = 1,000 chat servers

Message volume:
  messages_per_DAU_per_day = 40
  total_messages_per_day = 500M x 40 = 20B messages/day
  message_QPS = 20B / 86,400 ~ 231,000 QPS

Storage (1-year retention):
  message_size ~ 100 bytes (text only)
  daily_storage = 20B x 100 bytes = 2 TB/day
  yearly_storage = 2 TB x 365 = 730 TB ~ 1 PB/year
  -> NoSQL (wide-column) preferred; message reads are by (conversation_id, seq_id range)

Cross-reference: Capacity-Estimation for the shared DAU-to-QPS-to-storage methodology. Wide-column vs relational tradeoff is covered in SQL-vs-NoSQL.

Conclusion: 1,000 WebSocket chat servers needed at peak. 231K message QPS requires sharded NoSQL storage. 1 PB/year for 1-year retention demands wide-column architecture with efficient range queries.

Central Technical Problem

Message ordering guarantees and offline delivery in a distributed chat system.

Messages from different senders arrive at servers in non-deterministic order. Without monotonic sequence numbers, recipients see messages out of order. Offline users need messages queued and delivered in order upon reconnection. These two problems are coupled — the same seq_id that enforces ordering also identifies message delivery gaps.

Message ordering

Two approaches:

Server-assigned seq_id (chosen approach): The chat server assigns a monotonically increasing seq_id per conversation. In-order delivery is guaranteed by seq_id. Requires a sequence number service — either a single-writer (bottleneck) or a distributed Snowflake-style ID. See Unique-ID-Generator-Design for Snowflake mechanics. Gap detection: client tracks last_received_seq_id; any gap in the sequence triggers a re-sync request to the server.

Client-assigned timestamps (rejected): Clients attach logical timestamps to messages. Subject to clock skew — two messages sent simultaneously on different clients may have identical or inverted timestamps. Mitigating clock skew requires NTP or vector clocks, adding coordination complexity without eliminating the problem.

Decision: Server-assigned seq_id per conversation is the standard approach. Snowflake-style distributed IDs handle ID assignment at scale without a coordination bottleneck. Re-delivered messages on reconnect are deduplicated by seq_id — see Idempotent-Consumer for the idempotent delivery pattern.

Offline delivery

When a recipient is offline, messages are stored in a durable offline queue per recipient. See Message-Queue for at-least-once delivery guarantees.

On reconnect, the client sends its last_received_seq_id to the server. The server delivers all messages with seq_id > last_received in order. Message retention window (e.g., 30 days) bounds offline queue storage. After the retention window, undelivered messages are dropped — this must be communicated in the product (e.g., "Messages older than 30 days may not have been delivered").

WebSocket connection management

Each online user maintains a long-lived WebSocket to a chat server. Chat servers are stateful — they hold the active connection-to-user mapping. A connection service maps user_id -> chat_server_id, enabling message routing across the server pool. Connection routing uses Consistent-Hashing to distribute users across chat servers without hotspot routing.

Message routing: sender's server receives the message → looks up recipient's chat_server_id via the connection service → forwards the message to the correct server via an internal service bus → recipient's server delivers over the open WebSocket.

The chat server acts as a Mediator-Pattern — senders and receivers do not connect directly; all message exchange is mediated through the chat server pool.

Presence service

Heartbeat-based: client sends a heartbeat every N seconds (e.g., 5 seconds). The presence service marks a user offline after a missed heartbeat timeout (e.g., 3 missed heartbeats = 15 seconds). Presence updates are published via pub-sub — see Observer-Pattern for the publish-subscribe model. Subscribed contacts receive status change notifications. The presence service can be AP under CAP-Theorem — slightly stale online status is acceptable; message ordering is CP (strong consistency on seq_id).

Group chat fan-out

Small groups (< 100 members): fan-out-on-write — message is written to all member offline queues. Write amplification is bounded by group size.

Large groups (1K+ members): the same hybrid push/pull decision as the news feed celebrity problem applies — see News-Feed-Design for the hybrid fan-out strategy. Above a group size threshold, messages are written to a shared group message store and members pull on read rather than receiving individual queue entries.

Component Design

[Client] <--WebSocket--> [Chat Server Pool]
                               |
                    [Connection Service]
                    (user_id -> server_id)
                               |
              +----------------+----------------+
              |                |                |
     [Message Service]  [Presence Service] [Group Service]
              |                |                |
     [Message Store]    [Status Cache]   [Member Store]
     (NoSQL, sharded    (Redis)          (relational)
      by conversation_id)
              |
     [Offline Queue]
     (per recipient)

Component responsibilities:

Chat Server Pool — maintains long-lived WebSocket connections; routes messages between senders and recipients; stateful per connection
Connection Service — service registry mapping user_id -> chat_server_id; queried on each message send to route to the correct chat server
Message Service — persists messages to durable store; assigns seq_id; manages offline queue writes for recipients who are not connected
Presence Service — tracks online/offline status via heartbeats; publishes status change events to subscriber contacts
Group Service — manages group membership; handles fan-out decisions (push for small groups, pull for large groups)
Message Store — wide-column NoSQL store sharded by conversation_id; range queries by (conversation_id, seq_id) retrieve conversation history efficiently — see Database-Sharding
Offline Queue — durable per-recipient message queue; at-least-once delivery on reconnect
Status Cache — Redis store for live presence state; TTL matches heartbeat timeout

System Diagram

Chat-System-Design-diagram.excalidraw

Alternatives Considered

Decision	Alternative A	Alternative B	Why Chosen Approach Wins
Connection protocol	HTTP polling (client polls every N seconds)	HTTP long-polling (server holds request open)	WebSocket provides true bidirectional communication with low overhead; polling adds latency; long-polling is a workaround that wastes connections
Message ordering	Client-assigned timestamps	Vector clocks (distributed logical time)	Server-assigned `seq_id` eliminates clock skew without coordination overhead; vector clocks are complex to implement and debug; Snowflake IDs provide distributed assignment without bottleneck
Message storage	SQL relational (messages table)	Key-value store (message per key)	Wide-column NoSQL enables efficient range queries by `(conversation_id, seq_id)`; SQL struggles at 1 PB scale; key-value has no efficient range query
Group fan-out	Fan-out-on-write for all groups	Fan-out-on-read for all groups	Write-only fails for large groups (1K+ members); read-only fails for small groups (high read latency); hybrid bounded by group size threshold

Likely Follow-Up Questions

How do you handle read receipts (delivered vs read)? — Delivered status: confirmed when the offline queue is drained on reconnect. Read status: client sends a read receipt event when the user views the message; server updates a read_at timestamp; recipient sees the "read" indicator.
How do you support message search across conversations? — Full-text search requires an inverted index (Elasticsearch-style); message store write path also writes to a search index; query path routes search requests to the search cluster. Cross-link: search indexing is covered in the Search-Autocomplete-Design note.
How would you add end-to-end encryption? — Signal protocol: each client holds a private key; server stores only public keys and encrypted message blobs; server cannot decrypt. Key exchange uses the Double Ratchet algorithm. Server-side search becomes impossible with E2E encryption.
What happens when a chat server crashes (connection migration)? — Clients detect WebSocket disconnect and reconnect to any available chat server; the connection service updates the user_id -> chat_server_id mapping on reconnect; in-flight messages are re-sent from the offline queue.
How do you handle message deletion/editing after send? — Deletion: a message_deleted event is broadcast to all conversation participants; clients hide the message. Editing: a message_edited event carries the new content and a version number; seq_id remains unchanged.
How do you scale the presence service (millions of subscribers)? — Presence fan-out is bounded by the contact list size, not global user count. Shard the presence service by user_id; each shard handles pub-sub for its assigned users. Consistent-Hashing routes presence queries to the correct shard.

Existing Pattern Connections

Design Decision	Existing Pattern	Relationship
WebSocket connection multiplexed to chat server	Mediator-Pattern	Chat server mediates all message exchange; senders and receivers do not connect directly — classic Mediator topology
Offline message queue per recipient	Message-Queue	Undelivered messages queued in durable store; at-least-once delivery guarantee; client deduplicates via seq_id
Presence service (online/offline status)	Observer-Pattern	Presence is a publish-subscribe pattern: user status changes published; subscribed contacts receive updates
Message sharding by conversation_id	Database-Sharding	All messages for a conversation co-located on one shard; range queries by (conversation_id, seq_id) are efficient
seq_id gap detection + resync	Idempotent-Consumer	Re-delivered messages on reconnect must be idempotent; client deduplicates by seq_id to avoid duplicate display

Chat System Design

Chat System Design

Clarify First

Capacity Estimation

Central Technical Problem

Message ordering

Offline delivery

WebSocket connection management

Presence service

Group chat fan-out

Component Design

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions

Chat System Design

Tags

Chat System Design

Clarify First

Capacity Estimation

Central Technical Problem

Message ordering

Offline delivery

WebSocket connection management

Presence service

Group chat fan-out

Component Design

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions