Resilience4j

Resilience4j

A lightweight, modular fault tolerance library for Java, providing circuit breakers, rate limiters, retries, bulkheads, and time limiters — the standard resilience layer in the Spring Cloud ecosystem.


Core Idea

Resilience4j is the replacement for Netflix Hystrix (maintenance mode since 2018) in the Spring ecosystem. It is designed around Java 8+ functional interfaces and integrates natively with reactive types (Mono, Flux) via the Reactor adapter. In the Spring Cloud Gateway context, Resilience4j circuit breakers are configured per-route and react to downstream failures without writing custom code.

The key integration point for BFF development is spring-cloud-starter-circuitbreaker-reactor-resilience4j, which wires Resilience4j into Spring Cloud's ReactiveCircuitBreaker abstraction and into SCG's CircuitBreakerGatewayFilterFactory.


Key Principles

  1. Decorator, not framework — Resilience4j wraps individual function calls; it does not require a different threading model or framework.
  2. Modular — each resilience pattern (circuit breaker, retry, rate limiter, bulkhead, time limiter) is an independent module; use only what you need.
  3. Reactive-firstio.github.resilience4j:resilience4j-reactor provides CircuitBreakerOperator, RetryOperator, etc., for wrapping Reactor pipelines.
  4. Observable state — circuit breaker state, call metrics, and events are exposed via Micrometer gauges and Spring Boot Actuator health indicators.

How It Works

Module Overview

ModuleArtifactPurpose
Circuit Breakerresilience4j-circuitbreakerStop calling failing services
Retryresilience4j-retryRetry transient failures with backoff
Rate Limiterresilience4j-ratelimiterLimit call rate to a service
Bulkheadresilience4j-bulkheadLimit concurrent calls (semaphore or thread pool)
Time Limiterresilience4j-timelimiterTimeout wrapper for async calls
Spring Boot Starterresilience4j-spring-boot3Auto-configuration for Spring Boot 3.x

Spring Cloud Integration

The dependency that wires everything together for a BFF on Spring Cloud Gateway:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>

This provides:

  • ReactiveCircuitBreakerFactory bean for programmatic use
  • Resilience4JCircuitBreakerFactory auto-configuration
  • SCG CircuitBreakerGatewayFilterFactory (the CircuitBreaker filter in YAML)

Circuit Breaker Configuration (application.yml)

resilience4j:
  circuitbreaker:
    instances:
      userServiceCB:
        registerHealthIndicator: true
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 10
        minimumNumberOfCalls: 5
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
        permittedNumberOfCallsInHalfOpenState: 3
        slowCallRateThreshold: 80
        slowCallDurationThreshold: 3s
        automaticTransitionFromOpenToHalfOpenEnabled: true
 
      orderServiceCB:
        registerHealthIndicator: true
        slidingWindowType: COUNT_BASED
        slidingWindowSize: 20
        minimumNumberOfCalls: 10
        failureRateThreshold: 60
        waitDurationInOpenState: 60s
        permittedNumberOfCallsInHalfOpenState: 5

Retry Configuration (application.yml)

resilience4j:
  retry:
    instances:
      userServiceRetry:
        maxAttempts: 3
        waitDuration: 100ms
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        retryExceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException.ServiceUnavailable
          - java.util.concurrent.TimeoutException
        ignoreExceptions:
          - org.springframework.web.reactive.function.client.WebClientResponseException.BadRequest
          - org.springframework.web.reactive.function.client.WebClientResponseException.NotFound

Programmatic Usage with Reactive Chains

@Service
public class ResilientUserService {
 
    private final WebClient webClient;
    private final ReactiveCircuitBreakerFactory circuitBreakerFactory;
 
    public ResilientUserService(WebClient.Builder builder,
                                 ReactiveCircuitBreakerFactory cbFactory) {
        this.webClient = builder.baseUrl("http://user-service").build();
        this.circuitBreakerFactory = cbFactory;
    }
 
    public Mono<UserProfile> getProfile(String userId) {
        ReactiveCircuitBreaker cb = circuitBreakerFactory.create("userServiceCB");
 
        return cb.run(
            webClient.get()
                .uri("/users/{id}", userId)
                .retrieve()
                .bodyToMono(UserProfile.class)
                .timeout(Duration.ofSeconds(3)),
            throwable -> {
                // Fallback: return cached/empty profile when circuit is open
                log.warn("Circuit breaker open for user-service, returning fallback. userId={}", userId);
                return Mono.just(UserProfile.empty(userId));
            }
        );
    }
}

Actuator Health Integration

With registerHealthIndicator: true, each circuit breaker exposes its state in /actuator/health:

{
  "status": "UP",
  "components": {
    "circuitBreakers": {
      "status": "UP",
      "details": {
        "userServiceCB": {
          "status": "UP",
          "details": {
            "failureRate": "0.0%",
            "slowCallRate": "0.0%",
            "state": "CLOSED",
            "bufferedCalls": 5,
            "failedCalls": 0
          }
        }
      }
    }
  }
}

When a circuit opens, the health status changes to DOWN or OUT_OF_SERVICE, triggering alerts in monitoring systems.


Examples

  • SCG route with circuit breakerCircuitBreaker filter wraps the user-service route; fallback URI serves a 503 with a cached response body.
  • Programmatic usage in aggregation serviceReactiveCircuitBreakerFactory.create("orderServiceCB").run(webClientCall, fallback) wraps each branch of a Mono.zip fan-out.
  • Retry + circuit breaker composition — retry 3 times on ServiceUnavailable, then circuit breaker opens after sustained failures.

See complete configuration and fallback handlers in P3-BFF-Implementation-Patterns — Example 4.


Common Misconceptions

  • Resilience4j requires Hystrix Dashboard: Hystrix is a different library (Netflix, now retired). Resilience4j exposes metrics via Micrometer + Prometheus natively.
  • Retry and circuit breaker are redundant: Retry handles transient failures (brief blips); circuit breaker handles sustained failures (service is down). Use both, with retry configured to give up before the circuit breaker window fills.
  • Circuit breaker state is shared across pods: Resilience4j state is in-process by default. In Kubernetes with 3 BFF pods, each pod has its own circuit breaker state. This is usually acceptable; use Redis-backed state for strict coordination.

Why It Matters

The BFF-Pattern calls N downstream services. Without circuit breakers, a degraded downstream service slowly consumes all available connections and threads on the BFF. With Resilience4j:

  • Failing calls are cut off fast (open circuit = no network call)
  • BFF resources are freed for other routes
  • Clients receive degraded-but-functional responses rather than cascading 500s
  • Operations teams see circuit state in health endpoints and dashboards

ConceptRelationship
Circuit-Breaker-PatternResilience4j is the primary Java implementation
Spring-Cloud-GatewaySCG's CircuitBreaker filter uses Resilience4j
BFF-PatternResilience4j protects BFF fan-out from downstream failure
Request-AggregationEach aggregation branch should be wrapped with a circuit breaker
Project-ReactorResilience4j Reactor module wraps Mono/Flux pipelines

Sources

  • resilience4j.readme.io — official documentation
  • Spring Cloud CircuitBreaker reference: spring.io/projects/spring-cloud-circuitbreaker
  • spring-cloud-starter-circuitbreaker-reactor-resilience4j README
  • P3-BFF-Implementation-Patterns (Phase 3 research, IMPL-07)