Notification System Design

Notification System Design

A system that delivers time-sensitive messages across push, SMS, and email channels with priority-based processing, retry logic, and per-user delivery controls.

Scope note: This note covers the system design of a notification delivery service — the flow from trigger event to provider delivery, multi-channel fan-out, priority queues, and retry logic. Infrastructure-level queue mechanics are in Message-Queue. Application-layer duplicate delivery prevention is in Idempotent-Consumer. This note cross-links both rather than re-explaining them.

Clarify First

Before designing, lock these assumptions — each drives a different architectural decision:

  1. Which channels? Push (iOS APNs / Android FCM), SMS (Twilio), email (SendGrid/SES), in-app? Each channel has a distinct provider API, SLA, and failure mode.
  2. Delivery guarantees? At-least-once (most notification systems; tolerate duplicate push) or at-most-once (marketing where duplicates damage trust)?
  3. Priority tiers? Security alerts and OTPs vs marketing promotions have different latency SLAs. Separate queues vs a single priority queue.
  4. Per-user preferences? Users opt out of specific channels; the system must respect subscription state before dispatching to providers.
  5. Scale? 1M notifications/day (startup, synchronous delivery viable) vs 1B/day (fully async, dedicated worker pools per channel).
  6. Template management? Dynamic content (user name, order ID substituted at send time) vs static messages baked in at trigger time.

Capacity Estimation

Assumption: mid-scale consumer app, 2026.

DAU:                          50M users
Notifications per user/day:   5 (mix of transactional + promotional)
Total notifications/day:      50M * 5 = 250M notifications/day
avg_QPS:                      250M / 86,400 ≈ 2,900 notification events/sec
peak_QPS:                     2,900 * 3 = 8,700/sec (3x multiplier for campaign bursts)

Per-channel split (typical consumer app):
  Push:   70% = 6,100 push notifications/sec at peak
  Email:  20% = 1,740 emails/sec at peak
  SMS:    10% =   870 SMS/sec at peak

Provider constraints (2026):
  FCM:        no hard limit published; sustained bursts of 600K/sec documented
  SendGrid:   100 req/sec (free tier) to 3,000+ req/sec (paid); scale horizontally
  Twilio SMS: 100 messages/sec per long code; use short code for higher throughput

Cross-link: Capacity-Estimation for the shared derivation methodology.

Central Technical Problem

Reliable cross-channel fan-out at scale with per-user rate limiting. Three distinct hard problems:

  1. Fan-out without losing delivery: One NotificationTriggered event must fan out to push, SMS, and email workers in parallel. If the SMS worker fails, push and email must still succeed — per-channel retry independence is required. A single retry queue that blocks all channels on one failure is insufficient.

  2. Per-user rate limiting: A single user must not receive 200 promotional notifications per minute even if 200 campaigns trigger simultaneously. Enforcement happens at the queue consumption layer (worker level), not at the API layer — the notification service applies per-user quota before dispatching to providers. See Rate-Limiter-Design for the counter algorithm details.

  3. Provider unreliability: Third-party providers (FCM, Twilio, SendGrid) have availability SLAs of ~99.9% (~8.7 hours downtime/year). The notification service must implement retry with exponential backoff and a Circuit-Breaker-Pattern per provider — not a shared circuit breaker that takes down all channels when one provider degrades.

Component Design

End-to-End Flow

[Event Source] --> [Notification Service API]
                         |
                         v
                  [Message Queue] -- topics: notification.critical
                  /        |        \          notification.standard
                 /         |         \         notification.promotional
         [Push Worker] [SMS Worker] [Email Worker]
               |              |            |
          [FCM/APNs]       [Twilio]    [SendGrid]
               \              |            /
                \-- [Delivery Receipt Store] --/
                         |
                  [Analytics / Audit Log]

Each channel worker subscribes independently to the notification topics. Workers consume from their priority queue tier and apply per-user rate limiting before forwarding to the provider.

Priority Queue Design

Three queue topics with distinct consumer SLAs:

QueueUse CaseLatency SLAPer-User Rate LimitingConsumer Pool
notification.criticalSecurity alerts, OTP, payment failures100msNone — critical notifications always go throughDedicated, higher provisioned throughput
notification.standardTransactional (order shipped, password reset)1sMinimal guard onlyStandard worker pool
notification.promotionalMarketing campaigns, recommendations5 minutesEnforced — per-user quota checked before dispatchShared worker pool; lowest priority

The producer classifies notifications at trigger time and routes to the appropriate topic. Workers never re-classify.

Retry with Exponential Backoff

Failed provider calls (5xx response or timeout) return the message to the queue with a delivery delay:

attempt 1: retry after 1s
attempt 2: retry after 2s
attempt 3: retry after 4s
attempt N: retry after min(2^N seconds, 60s)   // cap at 60s
dead letter after 5 failed attempts → [[Dead-Letter-Queue]] → alert + manual review

Each retry must be idempotent: pass notification_id to the provider as an idempotency key. Providers supporting idempotency keys (SendGrid, Twilio) deduplicate on their side. For queue-level deduplication, see Idempotent-Consumer.

Per-User Rate Limiting

Implemented at the worker consumption layer, not at the API layer. Before dispatching to a provider, the worker checks:

key = "notif_rate:{user_id}:{channel}:{window}"
count = Redis INCR key
IF count == 1: EXPIRE key window_seconds
IF count > per_user_limit: drop notification or defer to promotional queue

This is distinct from API-level rate limiting (which limits inbound request volume). This limits outbound notification delivery per user. The algorithm follows the same Redis counter pattern as Rate-Limiter-Design but applied to outbound delivery.

Device Token Staleness

FCM and APNs return specific error codes for expired or invalid device tokens (NotRegistered on FCM, BadDeviceToken on APNs). On receipt of these errors, the notification service must deregister the token from the user's device registry. Retrying a stale token wastes provider API quota and will never succeed.

System Diagram

Notification-System-Design-diagram.excalidraw

Alternatives Considered

DecisionAlternativeWhy Chosen Approach Wins
Async fan-out via message queueSynchronous HTTP calls to providers at trigger timeSynchronous blocks the API request on provider latency; async decouples trigger from delivery and supports per-channel retry independence
Separate queue topics per prioritySingle queue with priority fieldSingle queue cannot enforce separate consumer SLAs or dedicated worker pools; critical notifications can be starved by promotional volume
Per-channel workersSingle worker handling all channelsSingle worker serialises channel processing; one slow provider (e.g., Twilio) blocks push and email delivery
Template engine at worker levelTemplate resolution at API trigger timeWorker-level resolution allows late-binding of user data (e.g., resolve current display name at send time, not at trigger time); more accurate for delayed promotional sends

Likely Follow-Up Questions

  1. How do you handle notification preferences (opt-in/opt-out per channel)? Subscription state service: each user has a preferences record {user_id, channel, subscribed: bool}. Workers query preferences before dispatching; unsubscribed channels are skipped, not errored.

  2. How do you prevent duplicate notifications to the same user? Idempotency key notification_id passed to provider. For in-house deduplication across queue retries, see Idempotent-Consumer — persistent deduplication store keyed on notification_id.

  3. How do you handle provider-specific rate limits (Twilio long code vs short code)? Provider adapter layer: each provider implementation enforces its own rate limit internally. Twilio long code limited to 100 msg/sec — use short code (up to 3,000 msg/sec) for campaign volumes. The notification service does not need to know the difference; the provider adapter does.

  4. How do you track end-to-end delivery latency? Trace IDs propagated from trigger event through queue to provider call. Delivery receipt timestamps stored in the receipt store. Analytics projection calculates p50/p99 latency per channel per priority tier.

  5. How do you handle marketing campaign bursts (millions of notifications at once)? Rate-limit the campaign producer at the API layer — accept the trigger event, but drip the notification.promotional topic at a controlled rate (token bucket at the producer side). The queue absorbs the burst; workers consume at a sustainable rate.

  6. How do you support internationalised notification content? Template engine resolves the user's locale preference at worker dispatch time. Templates stored per {template_id, locale}. Fallback chain: requested locale → default locale → hardcoded English fallback.

Existing Pattern Connections

Design DecisionExisting PatternRelationship
Async fan-out via message queueMessage-QueueNotification triggers are pub/sub events; worker pool subscribes to topics per channel; scope: infrastructure delivery mechanics live in Message-Queue, not here
Duplicate delivery preventionIdempotent-ConsumerNotification workers are idempotent consumers; provider idempotency key = notification_id; deduplication store pattern documented in Idempotent-Consumer
Failed retries after N attemptsDead-Letter-QueueNotifications exhausting retry budget go to DLQ for manual review and recovery; DLQ recovery paths apply
Priority queue designMessage-QueueSeparate queue topics per priority tier; critical queue has dedicated consumer pool; queue mechanics in Message-Queue
Per-user rate limitingRate-Limiter-DesignPer-user notification rate limiting follows the same Redis counter pattern as API rate limiting but applied to outbound delivery
Provider circuit breakerCircuit-Breaker-PatternPer-provider circuit breaker: if FCM returns 5xx for >50% of requests in a 30s window, stop calling FCM; prevents cascade failure across channels
Delivery receipt analyticsCQRS-PatternDelivery receipts are write-path events; analytics queries (delivery rates, open rates, p99 latency) read from a separate projection — CQRS read/write separation

(populated in Phase 32 backlink sweep)