Resilience4j
Resilience4j
A lightweight, modular fault tolerance library for Java, providing circuit breakers, rate limiters, retries, bulkheads, and time limiters — the standard resilience layer in the Spring Cloud ecosystem.
Core Idea
Resilience4j is the replacement for Netflix Hystrix (maintenance mode since 2018) in the Spring ecosystem. It is designed around Java 8+ functional interfaces and integrates natively with reactive types (Mono, Flux) via the Reactor adapter. In the Spring Cloud Gateway context, Resilience4j circuit breakers are configured per-route and react to downstream failures without writing custom code.
The key integration point for BFF development is spring-cloud-starter-circuitbreaker-reactor-resilience4j, which wires Resilience4j into Spring Cloud's ReactiveCircuitBreaker abstraction and into SCG's CircuitBreakerGatewayFilterFactory.
Key Principles
- Decorator, not framework — Resilience4j wraps individual function calls; it does not require a different threading model or framework.
- Modular — each resilience pattern (circuit breaker, retry, rate limiter, bulkhead, time limiter) is an independent module; use only what you need.
- Reactive-first —
io.github.resilience4j:resilience4j-reactorprovidesCircuitBreakerOperator,RetryOperator, etc., for wrapping Reactor pipelines. - Observable state — circuit breaker state, call metrics, and events are exposed via Micrometer gauges and Spring Boot Actuator health indicators.
How It Works
Module Overview
| Module | Artifact | Purpose |
|---|---|---|
| Circuit Breaker | resilience4j-circuitbreaker | Stop calling failing services |
| Retry | resilience4j-retry | Retry transient failures with backoff |
| Rate Limiter | resilience4j-ratelimiter | Limit call rate to a service |
| Bulkhead | resilience4j-bulkhead | Limit concurrent calls (semaphore or thread pool) |
| Time Limiter | resilience4j-timelimiter | Timeout wrapper for async calls |
| Spring Boot Starter | resilience4j-spring-boot3 | Auto-configuration for Spring Boot 3.x |
Spring Cloud Integration
The dependency that wires everything together for a BFF on Spring Cloud Gateway:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>This provides:
ReactiveCircuitBreakerFactorybean for programmatic useResilience4JCircuitBreakerFactoryauto-configuration- SCG
CircuitBreakerGatewayFilterFactory(theCircuitBreakerfilter in YAML)
Circuit Breaker Configuration (application.yml)
resilience4j:
circuitbreaker:
instances:
userServiceCB:
registerHealthIndicator: true
slidingWindowType: COUNT_BASED
slidingWindowSize: 10
minimumNumberOfCalls: 5
failureRateThreshold: 50
waitDurationInOpenState: 30s
permittedNumberOfCallsInHalfOpenState: 3
slowCallRateThreshold: 80
slowCallDurationThreshold: 3s
automaticTransitionFromOpenToHalfOpenEnabled: true
orderServiceCB:
registerHealthIndicator: true
slidingWindowType: COUNT_BASED
slidingWindowSize: 20
minimumNumberOfCalls: 10
failureRateThreshold: 60
waitDurationInOpenState: 60s
permittedNumberOfCallsInHalfOpenState: 5Retry Configuration (application.yml)
resilience4j:
retry:
instances:
userServiceRetry:
maxAttempts: 3
waitDuration: 100ms
enableExponentialBackoff: true
exponentialBackoffMultiplier: 2
retryExceptions:
- org.springframework.web.reactive.function.client.WebClientResponseException.ServiceUnavailable
- java.util.concurrent.TimeoutException
ignoreExceptions:
- org.springframework.web.reactive.function.client.WebClientResponseException.BadRequest
- org.springframework.web.reactive.function.client.WebClientResponseException.NotFoundProgrammatic Usage with Reactive Chains
@Service
public class ResilientUserService {
private final WebClient webClient;
private final ReactiveCircuitBreakerFactory circuitBreakerFactory;
public ResilientUserService(WebClient.Builder builder,
ReactiveCircuitBreakerFactory cbFactory) {
this.webClient = builder.baseUrl("http://user-service").build();
this.circuitBreakerFactory = cbFactory;
}
public Mono<UserProfile> getProfile(String userId) {
ReactiveCircuitBreaker cb = circuitBreakerFactory.create("userServiceCB");
return cb.run(
webClient.get()
.uri("/users/{id}", userId)
.retrieve()
.bodyToMono(UserProfile.class)
.timeout(Duration.ofSeconds(3)),
throwable -> {
// Fallback: return cached/empty profile when circuit is open
log.warn("Circuit breaker open for user-service, returning fallback. userId={}", userId);
return Mono.just(UserProfile.empty(userId));
}
);
}
}Actuator Health Integration
With registerHealthIndicator: true, each circuit breaker exposes its state in /actuator/health:
{
"status": "UP",
"components": {
"circuitBreakers": {
"status": "UP",
"details": {
"userServiceCB": {
"status": "UP",
"details": {
"failureRate": "0.0%",
"slowCallRate": "0.0%",
"state": "CLOSED",
"bufferedCalls": 5,
"failedCalls": 0
}
}
}
}
}
}When a circuit opens, the health status changes to DOWN or OUT_OF_SERVICE, triggering alerts in monitoring systems.
Examples
- SCG route with circuit breaker —
CircuitBreakerfilter wraps theuser-serviceroute; fallback URI serves a 503 with a cached response body. - Programmatic usage in aggregation service —
ReactiveCircuitBreakerFactory.create("orderServiceCB").run(webClientCall, fallback)wraps each branch of aMono.zipfan-out. - Retry + circuit breaker composition — retry 3 times on
ServiceUnavailable, then circuit breaker opens after sustained failures.
See complete configuration and fallback handlers in P3-BFF-Implementation-Patterns — Example 4.
Common Misconceptions
Resilience4j requires Hystrix Dashboard: Hystrix is a different library (Netflix, now retired). Resilience4j exposes metrics via Micrometer + Prometheus natively.Retry and circuit breaker are redundant: Retry handles transient failures (brief blips); circuit breaker handles sustained failures (service is down). Use both, with retry configured to give up before the circuit breaker window fills.Circuit breaker state is shared across pods: Resilience4j state is in-process by default. In Kubernetes with 3 BFF pods, each pod has its own circuit breaker state. This is usually acceptable; use Redis-backed state for strict coordination.
Why It Matters
The BFF-Pattern calls N downstream services. Without circuit breakers, a degraded downstream service slowly consumes all available connections and threads on the BFF. With Resilience4j:
- Failing calls are cut off fast (open circuit = no network call)
- BFF resources are freed for other routes
- Clients receive degraded-but-functional responses rather than cascading 500s
- Operations teams see circuit state in health endpoints and dashboards
Related Concepts
| Concept | Relationship |
|---|---|
| Circuit-Breaker-Pattern | Resilience4j is the primary Java implementation |
| Spring-Cloud-Gateway | SCG's CircuitBreaker filter uses Resilience4j |
| BFF-Pattern | Resilience4j protects BFF fan-out from downstream failure |
| Request-Aggregation | Each aggregation branch should be wrapped with a circuit breaker |
| Project-Reactor | Resilience4j Reactor module wraps Mono/Flux pipelines |
Sources
- resilience4j.readme.io — official documentation
- Spring Cloud CircuitBreaker reference: spring.io/projects/spring-cloud-circuitbreaker
- spring-cloud-starter-circuitbreaker-reactor-resilience4j README
- P3-BFF-Implementation-Patterns (Phase 3 research, IMPL-07)