Notification System Design

A system that delivers time-sensitive messages across push, SMS, and email channels with priority-based processing, retry logic, and per-user delivery controls.

Scope note: This note covers the system design of a notification delivery service — the flow from trigger event to provider delivery, multi-channel fan-out, priority queues, and retry logic. Infrastructure-level queue mechanics are in Message-Queue. Application-layer duplicate delivery prevention is in Idempotent-Consumer. This note cross-links both rather than re-explaining them.

Clarify First

Before designing, lock these assumptions — each drives a different architectural decision:

Which channels? Push (iOS APNs / Android FCM), SMS (Twilio), email (SendGrid/SES), in-app? Each channel has a distinct provider API, SLA, and failure mode.
Delivery guarantees? At-least-once (most notification systems; tolerate duplicate push) or at-most-once (marketing where duplicates damage trust)?
Priority tiers? Security alerts and OTPs vs marketing promotions have different latency SLAs. Separate queues vs a single priority queue.
Per-user preferences? Users opt out of specific channels; the system must respect subscription state before dispatching to providers.
Scale? 1M notifications/day (startup, synchronous delivery viable) vs 1B/day (fully async, dedicated worker pools per channel).
Template management? Dynamic content (user name, order ID substituted at send time) vs static messages baked in at trigger time.

Capacity Estimation

Assumption: mid-scale consumer app, 2026.

DAU:                          50M users
Notifications per user/day:   5 (mix of transactional + promotional)
Total notifications/day:      50M * 5 = 250M notifications/day
avg_QPS:                      250M / 86,400 ≈ 2,900 notification events/sec
peak_QPS:                     2,900 * 3 = 8,700/sec (3x multiplier for campaign bursts)

Per-channel split (typical consumer app):
  Push:   70% = 6,100 push notifications/sec at peak
  Email:  20% = 1,740 emails/sec at peak
  SMS:    10% =   870 SMS/sec at peak

Provider constraints (2026):
  FCM:        no hard limit published; sustained bursts of 600K/sec documented
  SendGrid:   100 req/sec (free tier) to 3,000+ req/sec (paid); scale horizontally
  Twilio SMS: 100 messages/sec per long code; use short code for higher throughput

Cross-link: Capacity-Estimation for the shared derivation methodology.

Central Technical Problem

Reliable cross-channel fan-out at scale with per-user rate limiting. Three distinct hard problems:

Fan-out without losing delivery: One NotificationTriggered event must fan out to push, SMS, and email workers in parallel. If the SMS worker fails, push and email must still succeed — per-channel retry independence is required. A single retry queue that blocks all channels on one failure is insufficient.
Per-user rate limiting: A single user must not receive 200 promotional notifications per minute even if 200 campaigns trigger simultaneously. Enforcement happens at the queue consumption layer (worker level), not at the API layer — the notification service applies per-user quota before dispatching to providers. See Rate-Limiter-Design for the counter algorithm details.
Provider unreliability: Third-party providers (FCM, Twilio, SendGrid) have availability SLAs of ~99.9% (~8.7 hours downtime/year). The notification service must implement retry with exponential backoff and a Circuit-Breaker-Pattern per provider — not a shared circuit breaker that takes down all channels when one provider degrades.

Component Design

End-to-End Flow

[Event Source] --> [Notification Service API]
                         |
                         v
                  [Message Queue] -- topics: notification.critical
                  /        |        \          notification.standard
                 /         |         \         notification.promotional
         [Push Worker] [SMS Worker] [Email Worker]
               |              |            |
          [FCM/APNs]       [Twilio]    [SendGrid]
               \              |            /
                \-- [Delivery Receipt Store] --/
                         |
                  [Analytics / Audit Log]

Each channel worker subscribes independently to the notification topics. Workers consume from their priority queue tier and apply per-user rate limiting before forwarding to the provider.

Priority Queue Design

Three queue topics with distinct consumer SLAs:

Queue	Use Case	Latency SLA	Per-User Rate Limiting	Consumer Pool
`notification.critical`	Security alerts, OTP, payment failures	100ms	None — critical notifications always go through	Dedicated, higher provisioned throughput
`notification.standard`	Transactional (order shipped, password reset)	1s	Minimal guard only	Standard worker pool
`notification.promotional`	Marketing campaigns, recommendations	5 minutes	Enforced — per-user quota checked before dispatch	Shared worker pool; lowest priority

The producer classifies notifications at trigger time and routes to the appropriate topic. Workers never re-classify.

Retry with Exponential Backoff

Failed provider calls (5xx response or timeout) return the message to the queue with a delivery delay:

attempt 1: retry after 1s
attempt 2: retry after 2s
attempt 3: retry after 4s
attempt N: retry after min(2^N seconds, 60s)   // cap at 60s
dead letter after 5 failed attempts → [[Dead-Letter-Queue]] → alert + manual review

Each retry must be idempotent: pass notification_id to the provider as an idempotency key. Providers supporting idempotency keys (SendGrid, Twilio) deduplicate on their side. For queue-level deduplication, see Idempotent-Consumer.

Per-User Rate Limiting

Implemented at the worker consumption layer, not at the API layer. Before dispatching to a provider, the worker checks:

key = "notif_rate:{user_id}:{channel}:{window}"
count = Redis INCR key
IF count == 1: EXPIRE key window_seconds
IF count > per_user_limit: drop notification or defer to promotional queue

This is distinct from API-level rate limiting (which limits inbound request volume). This limits outbound notification delivery per user. The algorithm follows the same Redis counter pattern as Rate-Limiter-Design but applied to outbound delivery.

Device Token Staleness

FCM and APNs return specific error codes for expired or invalid device tokens (NotRegistered on FCM, BadDeviceToken on APNs). On receipt of these errors, the notification service must deregister the token from the user's device registry. Retrying a stale token wastes provider API quota and will never succeed.

System Diagram

Notification-System-Design-diagram.excalidraw

Alternatives Considered

Decision	Alternative	Why Chosen Approach Wins
Async fan-out via message queue	Synchronous HTTP calls to providers at trigger time	Synchronous blocks the API request on provider latency; async decouples trigger from delivery and supports per-channel retry independence
Separate queue topics per priority	Single queue with priority field	Single queue cannot enforce separate consumer SLAs or dedicated worker pools; critical notifications can be starved by promotional volume
Per-channel workers	Single worker handling all channels	Single worker serialises channel processing; one slow provider (e.g., Twilio) blocks push and email delivery
Template engine at worker level	Template resolution at API trigger time	Worker-level resolution allows late-binding of user data (e.g., resolve current display name at send time, not at trigger time); more accurate for delayed promotional sends

Likely Follow-Up Questions

How do you handle notification preferences (opt-in/opt-out per channel)? Subscription state service: each user has a preferences record {user_id, channel, subscribed: bool}. Workers query preferences before dispatching; unsubscribed channels are skipped, not errored.
How do you prevent duplicate notifications to the same user? Idempotency key notification_id passed to provider. For in-house deduplication across queue retries, see Idempotent-Consumer — persistent deduplication store keyed on notification_id.
How do you handle provider-specific rate limits (Twilio long code vs short code)? Provider adapter layer: each provider implementation enforces its own rate limit internally. Twilio long code limited to 100 msg/sec — use short code (up to 3,000 msg/sec) for campaign volumes. The notification service does not need to know the difference; the provider adapter does.
How do you track end-to-end delivery latency? Trace IDs propagated from trigger event through queue to provider call. Delivery receipt timestamps stored in the receipt store. Analytics projection calculates p50/p99 latency per channel per priority tier.
How do you handle marketing campaign bursts (millions of notifications at once)? Rate-limit the campaign producer at the API layer — accept the trigger event, but drip the notification.promotional topic at a controlled rate (token bucket at the producer side). The queue absorbs the burst; workers consume at a sustainable rate.
How do you support internationalised notification content? Template engine resolves the user's locale preference at worker dispatch time. Templates stored per {template_id, locale}. Fallback chain: requested locale → default locale → hardcoded English fallback.

Existing Pattern Connections

Design Decision	Existing Pattern	Relationship
Async fan-out via message queue	Message-Queue	Notification triggers are pub/sub events; worker pool subscribes to topics per channel; scope: infrastructure delivery mechanics live in Message-Queue, not here
Duplicate delivery prevention	Idempotent-Consumer	Notification workers are idempotent consumers; provider idempotency key = notification_id; deduplication store pattern documented in Idempotent-Consumer
Failed retries after N attempts	Dead-Letter-Queue	Notifications exhausting retry budget go to DLQ for manual review and recovery; DLQ recovery paths apply
Priority queue design	Message-Queue	Separate queue topics per priority tier; critical queue has dedicated consumer pool; queue mechanics in Message-Queue
Per-user rate limiting	Rate-Limiter-Design	Per-user notification rate limiting follows the same Redis counter pattern as API rate limiting but applied to outbound delivery
Provider circuit breaker	Circuit-Breaker-Pattern	Per-provider circuit breaker: if FCM returns 5xx for >50% of requests in a 30s window, stop calling FCM; prevents cascade failure across channels
Delivery receipt analytics	CQRS-Pattern	Delivery receipts are write-path events; analytics queries (delivery rates, open rates, p99 latency) read from a separate projection — CQRS read/write separation

Backlinks

(populated in Phase 32 backlink sweep)

Notification System Design

Tags

Notification System Design

Clarify First

Capacity Estimation

Central Technical Problem

Component Design

End-to-End Flow

Priority Queue Design

Retry with Exponential Backoff

Per-User Rate Limiting

Device Token Staleness

System Diagram

Alternatives Considered

Likely Follow-Up Questions

Existing Pattern Connections

Backlinks

Linked mentions