Notification System Design
Notification System Design
A system that delivers time-sensitive messages across push, SMS, and email channels with priority-based processing, retry logic, and per-user delivery controls.
Scope note: This note covers the system design of a notification delivery service — the flow from trigger event to provider delivery, multi-channel fan-out, priority queues, and retry logic. Infrastructure-level queue mechanics are in Message-Queue. Application-layer duplicate delivery prevention is in Idempotent-Consumer. This note cross-links both rather than re-explaining them.
Clarify First
Before designing, lock these assumptions — each drives a different architectural decision:
- Which channels? Push (iOS APNs / Android FCM), SMS (Twilio), email (SendGrid/SES), in-app? Each channel has a distinct provider API, SLA, and failure mode.
- Delivery guarantees? At-least-once (most notification systems; tolerate duplicate push) or at-most-once (marketing where duplicates damage trust)?
- Priority tiers? Security alerts and OTPs vs marketing promotions have different latency SLAs. Separate queues vs a single priority queue.
- Per-user preferences? Users opt out of specific channels; the system must respect subscription state before dispatching to providers.
- Scale? 1M notifications/day (startup, synchronous delivery viable) vs 1B/day (fully async, dedicated worker pools per channel).
- Template management? Dynamic content (user name, order ID substituted at send time) vs static messages baked in at trigger time.
Capacity Estimation
Assumption: mid-scale consumer app, 2026.
DAU: 50M users
Notifications per user/day: 5 (mix of transactional + promotional)
Total notifications/day: 50M * 5 = 250M notifications/day
avg_QPS: 250M / 86,400 ≈ 2,900 notification events/sec
peak_QPS: 2,900 * 3 = 8,700/sec (3x multiplier for campaign bursts)
Per-channel split (typical consumer app):
Push: 70% = 6,100 push notifications/sec at peak
Email: 20% = 1,740 emails/sec at peak
SMS: 10% = 870 SMS/sec at peak
Provider constraints (2026):
FCM: no hard limit published; sustained bursts of 600K/sec documented
SendGrid: 100 req/sec (free tier) to 3,000+ req/sec (paid); scale horizontally
Twilio SMS: 100 messages/sec per long code; use short code for higher throughput
Cross-link: Capacity-Estimation for the shared derivation methodology.
Central Technical Problem
Reliable cross-channel fan-out at scale with per-user rate limiting. Three distinct hard problems:
-
Fan-out without losing delivery: One
NotificationTriggeredevent must fan out to push, SMS, and email workers in parallel. If the SMS worker fails, push and email must still succeed — per-channel retry independence is required. A single retry queue that blocks all channels on one failure is insufficient. -
Per-user rate limiting: A single user must not receive 200 promotional notifications per minute even if 200 campaigns trigger simultaneously. Enforcement happens at the queue consumption layer (worker level), not at the API layer — the notification service applies per-user quota before dispatching to providers. See Rate-Limiter-Design for the counter algorithm details.
-
Provider unreliability: Third-party providers (FCM, Twilio, SendGrid) have availability SLAs of ~99.9% (~8.7 hours downtime/year). The notification service must implement retry with exponential backoff and a Circuit-Breaker-Pattern per provider — not a shared circuit breaker that takes down all channels when one provider degrades.
Component Design
End-to-End Flow
[Event Source] --> [Notification Service API]
|
v
[Message Queue] -- topics: notification.critical
/ | \ notification.standard
/ | \ notification.promotional
[Push Worker] [SMS Worker] [Email Worker]
| | |
[FCM/APNs] [Twilio] [SendGrid]
\ | /
\-- [Delivery Receipt Store] --/
|
[Analytics / Audit Log]
Each channel worker subscribes independently to the notification topics. Workers consume from their priority queue tier and apply per-user rate limiting before forwarding to the provider.
Priority Queue Design
Three queue topics with distinct consumer SLAs:
| Queue | Use Case | Latency SLA | Per-User Rate Limiting | Consumer Pool |
|---|---|---|---|---|
notification.critical | Security alerts, OTP, payment failures | 100ms | None — critical notifications always go through | Dedicated, higher provisioned throughput |
notification.standard | Transactional (order shipped, password reset) | 1s | Minimal guard only | Standard worker pool |
notification.promotional | Marketing campaigns, recommendations | 5 minutes | Enforced — per-user quota checked before dispatch | Shared worker pool; lowest priority |
The producer classifies notifications at trigger time and routes to the appropriate topic. Workers never re-classify.
Retry with Exponential Backoff
Failed provider calls (5xx response or timeout) return the message to the queue with a delivery delay:
attempt 1: retry after 1s
attempt 2: retry after 2s
attempt 3: retry after 4s
attempt N: retry after min(2^N seconds, 60s) // cap at 60s
dead letter after 5 failed attempts → [[Dead-Letter-Queue]] → alert + manual review
Each retry must be idempotent: pass notification_id to the provider as an idempotency key. Providers supporting idempotency keys (SendGrid, Twilio) deduplicate on their side. For queue-level deduplication, see Idempotent-Consumer.
Per-User Rate Limiting
Implemented at the worker consumption layer, not at the API layer. Before dispatching to a provider, the worker checks:
key = "notif_rate:{user_id}:{channel}:{window}"
count = Redis INCR key
IF count == 1: EXPIRE key window_seconds
IF count > per_user_limit: drop notification or defer to promotional queue
This is distinct from API-level rate limiting (which limits inbound request volume). This limits outbound notification delivery per user. The algorithm follows the same Redis counter pattern as Rate-Limiter-Design but applied to outbound delivery.
Device Token Staleness
FCM and APNs return specific error codes for expired or invalid device tokens (NotRegistered on FCM, BadDeviceToken on APNs). On receipt of these errors, the notification service must deregister the token from the user's device registry. Retrying a stale token wastes provider API quota and will never succeed.
System Diagram
Notification-System-Design-diagram.excalidraw
Alternatives Considered
| Decision | Alternative | Why Chosen Approach Wins |
|---|---|---|
| Async fan-out via message queue | Synchronous HTTP calls to providers at trigger time | Synchronous blocks the API request on provider latency; async decouples trigger from delivery and supports per-channel retry independence |
| Separate queue topics per priority | Single queue with priority field | Single queue cannot enforce separate consumer SLAs or dedicated worker pools; critical notifications can be starved by promotional volume |
| Per-channel workers | Single worker handling all channels | Single worker serialises channel processing; one slow provider (e.g., Twilio) blocks push and email delivery |
| Template engine at worker level | Template resolution at API trigger time | Worker-level resolution allows late-binding of user data (e.g., resolve current display name at send time, not at trigger time); more accurate for delayed promotional sends |
Likely Follow-Up Questions
-
How do you handle notification preferences (opt-in/opt-out per channel)? Subscription state service: each user has a preferences record
{user_id, channel, subscribed: bool}. Workers query preferences before dispatching; unsubscribed channels are skipped, not errored. -
How do you prevent duplicate notifications to the same user? Idempotency key
notification_idpassed to provider. For in-house deduplication across queue retries, see Idempotent-Consumer — persistent deduplication store keyed onnotification_id. -
How do you handle provider-specific rate limits (Twilio long code vs short code)? Provider adapter layer: each provider implementation enforces its own rate limit internally. Twilio long code limited to 100 msg/sec — use short code (up to 3,000 msg/sec) for campaign volumes. The notification service does not need to know the difference; the provider adapter does.
-
How do you track end-to-end delivery latency? Trace IDs propagated from trigger event through queue to provider call. Delivery receipt timestamps stored in the receipt store. Analytics projection calculates p50/p99 latency per channel per priority tier.
-
How do you handle marketing campaign bursts (millions of notifications at once)? Rate-limit the campaign producer at the API layer — accept the trigger event, but drip the
notification.promotionaltopic at a controlled rate (token bucket at the producer side). The queue absorbs the burst; workers consume at a sustainable rate. -
How do you support internationalised notification content? Template engine resolves the user's locale preference at worker dispatch time. Templates stored per
{template_id, locale}. Fallback chain: requested locale → default locale → hardcoded English fallback.
Existing Pattern Connections
| Design Decision | Existing Pattern | Relationship |
|---|---|---|
| Async fan-out via message queue | Message-Queue | Notification triggers are pub/sub events; worker pool subscribes to topics per channel; scope: infrastructure delivery mechanics live in Message-Queue, not here |
| Duplicate delivery prevention | Idempotent-Consumer | Notification workers are idempotent consumers; provider idempotency key = notification_id; deduplication store pattern documented in Idempotent-Consumer |
| Failed retries after N attempts | Dead-Letter-Queue | Notifications exhausting retry budget go to DLQ for manual review and recovery; DLQ recovery paths apply |
| Priority queue design | Message-Queue | Separate queue topics per priority tier; critical queue has dedicated consumer pool; queue mechanics in Message-Queue |
| Per-user rate limiting | Rate-Limiter-Design | Per-user notification rate limiting follows the same Redis counter pattern as API rate limiting but applied to outbound delivery |
| Provider circuit breaker | Circuit-Breaker-Pattern | Per-provider circuit breaker: if FCM returns 5xx for >50% of requests in a 30s window, stop calling FCM; prevents cascade failure across channels |
| Delivery receipt analytics | CQRS-Pattern | Delivery receipts are write-path events; analytics queries (delivery rates, open rates, p99 latency) read from a separate projection — CQRS read/write separation |
Backlinks
(populated in Phase 32 backlink sweep)