Circuit Breaker Pattern

Circuit Breaker Pattern

A resilience pattern that stops calling a failing downstream service when a failure threshold is exceeded, giving the service time to recover and preventing cascading failures from propagating through the system.


Core Idea

Named after electrical circuit breakers, the software circuit breaker monitors calls to a downstream service. When failures exceed a threshold, it "trips" (opens), and subsequent calls are immediately rejected (fast-fail) without attempting the network call. After a wait period, it enters a half-open state to test if the service has recovered. This prevents a slow or failed downstream service from exhausting connection pools, blocking threads, or cascading failures upward.

In a BFF-Pattern context, circuit breakers are critical: the BFF calls multiple downstream services, and one degraded service can block all of its connections if unprotected.

The primary Spring implementation is Resilience4j, integrated into Spring-Cloud-Gateway via spring-cloud-starter-circuitbreaker-reactor-resilience4j.


Key Principles

  1. Fail fast, not slow — an open circuit returns an error or fallback immediately; clients get a quick degraded response instead of waiting for a timeout.
  2. Protect the caller, not just the callee — circuit breakers guard the BFF's resources (threads, connections) from being exhausted by a slow downstream.
  3. Graceful degradation via fallback — when the circuit is open, serve cached data, empty defaults, or a user-friendly error rather than a 500.
  4. State machine governs behaviour — the circuit moves through CLOSED → OPEN → HALF-OPEN → CLOSED (or OPEN again) based on configurable thresholds.

How It Works

State Machine

          failure rate > threshold
CLOSED ─────────────────────────────▶ OPEN
  ▲                                     │
  │    success rate > threshold         │  wait duration elapsed
  │                                     ▼
HALF-OPEN ◀──────────────────── HALF-OPEN
  (permits limited calls to test recovery)
StateBehaviour
CLOSEDAll calls pass through. Failures are counted. Normal operation.
OPENAll calls are rejected immediately (no network call). Fallback is returned.
HALF-OPENA limited number of test calls are permitted. If they succeed, transitions to CLOSED. If they fail, back to OPEN.

Failure Rate Threshold

The circuit opens when the failure rate (failed calls / total calls) exceeds a configured threshold (e.g., 50%) within a sliding window.

Two sliding window types:

  • Count-based — evaluate the last N calls (e.g., last 10 calls).
  • Time-based — evaluate calls within the last N seconds.

Configuration Parameters (Resilience4j)

ParameterDefaultDescription
failureRateThreshold50% of failed calls to open the circuit
slowCallRateThreshold100% of slow calls to open the circuit
slowCallDurationThreshold60sThreshold to classify a call as "slow"
waitDurationInOpenState60sHow long circuit stays OPEN before testing
permittedCallsInHalfOpenState10Test calls allowed in HALF-OPEN
slidingWindowSize100Number of calls (or seconds) in the window
minimumNumberOfCalls100Minimum calls before circuit can open

Examples

  • User Service degraded — CircuitBreaker for userServiceCB trips open after 50% failure rate. BFF returns cached user profile for 30 seconds while circuit is open.
  • Payment Service slowslowCallDurationThreshold: 3s triggers the circuit after 80% of calls exceed 3 seconds, preventing timeout pile-up.
  • Order Service restart — Circuit opens during rolling restart. After waitDurationInOpenState: 30s, circuit enters HALF-OPEN and allows 5 test calls. All succeed; circuit closes.

See YAML configuration and Java fallback handler in P3-BFF-Implementation-Patterns — Example 4.


Circuit Breaker in Spring Cloud Gateway

SCG integrates circuit breakers as a per-route filter:

filters:
  - name: CircuitBreaker
    args:
      name: orderServiceCB         # references Resilience4j config by name
      fallbackUri: forward:/fallback/orders  # internal fallback endpoint

The fallbackUri: forward:/fallback/orders redirects to a local controller method that returns the degraded response. This keeps fallback logic in Java code, not YAML.


Common Misconceptions

  • Circuit breaker replaces retry: They are complementary. Retry handles transient failures (network blip, momentary 503). Circuit breaker handles sustained failures (service is down). Retry should be attempted first; circuit breaker fires after repeated retries fail.
  • Opening the circuit immediately on one failure: minimumNumberOfCalls prevents premature tripping. The circuit needs enough data before making a statistical decision.
  • Circuit breaker state is shared across instances: By default, Resilience4j state is in-process. Each BFF pod has its own circuit breaker state. For distributed state, use Redis-backed implementations.

Why It Matters

In a microservices architecture, a single failing service can cascade: Service A is slow → Service B waits → Service B threads fill up → Service B is now slow → Service C waits → total outage. The circuit breaker pattern breaks this cascade by failing fast at each hop. For a BFF that calls 5 services, protecting each call with a circuit breaker ensures one degraded service produces one degraded response field, not a full 500.


ConceptRelationship
Resilience4jPrimary Java implementation of circuit breaker for Spring
BFF-PatternCircuit breakers protect each downstream call in a BFF fan-out
Spring-Cloud-GatewayCircuitBreakerGatewayFilterFactory applies circuit breakers per route
Request-AggregationIndividual aggregation branches should each be circuit-broken
Service-Mesh-PatternService Mesh subsumes circuit breaking; in a mesh architecture, the sidecar proxy (Envoy) handles circuit breaking without Resilience4j in the application
Ambassador-PatternAmbassador externalises circuit breaking to the sidecar proxy; per-service alternative to library-level circuit breaking
  • Load-Balancer — circuit breaker at the service layer complements load balancer health checks at the infrastructure layer; they are independent resilience mechanisms operating at different scopes
  • Rate-Limiter-Design — the rate limiter uses a circuit breaker on the Redis state store: if Redis becomes unavailable, the circuit breaker trips and the rate limiter fails open or closed per policy
  • Notification-System-Design — per-provider circuit breaker prevents cascade failure when a third-party provider (FCM, Twilio, SendGrid) degrades; each channel has its own circuit breaker to isolate failure
  • SLO-SLI-SLA — circuit breaker trip rate is an SLI input; when circuit breakers trip frequently, the error budget is being consumed — SLO burn rate determines whether the circuit breaker behavior warrants an alert
  • Alerting-Strategies — circuit breaker state transitions (CLOSED -> OPEN) are symptoms that feed into symptom-based alerting; Alerting-Strategies covers multi-window multi-burn-rate rules that determine alert severity

Sources

  • Martin Fowler: "CircuitBreaker" (martinfowler.com/bliki/CircuitBreaker.html)
  • Resilience4j docs: resilience4j.readme.io/docs/circuitbreaker
  • Spring Cloud CircuitBreaker reference
  • P3-BFF-Implementation-Patterns (Phase 3 research, IMPL-07)