Choreography Saga Pattern
Choreography Saga Pattern
"In the choreography approach, each local transaction publishes domain events that trigger local transactions in other services." — Chris Richardson, microservices.io
Intent
The Choreography Saga is a decentralised approach to distributed transaction coordination. There is no central orchestrator. Instead, each participating service listens for domain events emitted by the previous service, performs its own local transaction, and then emits an outcome event. The collective behaviour of the services forms the saga workflow — the control flow is implicit in the event chain.
Choreography works well when the services have independent, event-driven reactions and the number of participants is small (2-3 services). Each service only knows about the events it subscribes to and the events it emits — it has no knowledge of other services. This loose coupling is the primary advantage of choreography over orchestration.
Every event in a choreography saga must carry a sagaId (or correlationId) that is propagated unchanged from the triggering event to the outcome event. Without this identifier, correlating related events across services requires scanning all logs — which is the most common observability failure in choreography implementations.
When NOT to Use
- More than 3-4 participating services — control flow becomes an implicit event mesh; there is no single place to observe the overall saga state, and debugging requires correlating events across many service logs
- Flows with strict ordering requirements — choreography cannot enforce that step 3 waits for step 2 to complete across service boundaries; services react to events as they arrive
- Complex rollback scenarios — distributed compensation events are hard to reason about when multiple services may compensate simultaneously or when compensation events trigger unexpected reactions in other listeners
- Two-service flows — a two-service flow with compensation is a simple distributed call with retry-with-compensation in the caller; the overhead of saga event infrastructure exceeds the benefit
- When workflow visibility is critical — choreography has no orchestrator to query for current saga state; visibility requires a dedicated Saga State Tracker subscriber
When to Use
- 2-3 services with independent event reactions — each service reacts to an event and emits an outcome; no strict ordering across service boundaries
- Loose coupling is the primary design goal — services only know about events, not about each other; this maximises deployment independence
- Simple compensation paths — each service compensates its own local transaction by emitting a compensation event or handling a compensation event topic; compensation does not require cross-service coordination
How It Works
Each participating service is both a consumer and a producer. It subscribes to the event topic relevant to its step (e.g., order-events), performs its local transaction (e.g., reserve payment), and then produces an outcome event to the next topic (e.g., payment-events). Other services subscribe to this outcome event and react accordingly.
Compensation follows the same event-driven pattern but in the failure direction. When a service receives a saga-compensation-events message for its step, it executes its local compensation action (e.g., release a payment reservation) and may emit a compensation outcome event to trigger the previous step's compensation.
The sagaId field is mandatory on every event. It allows log correlation, dead letter queue analysis, and (optionally) a dedicated Saga State Tracker service to reconstruct the full execution trace without modifying any participating service.
Sequence Diagram
sequenceDiagram
participant OS as Order Service
participant OT as order-events topic
participant PS as Payment Service
participant PT as payment-events topic
participant IS as Inventory Service
participant IT as inventory-events topic
Note over OS,IT: Happy Path (forward)
OS->>OT: OrderCreated {sagaId, orderId}
OT->>PS: consume OrderCreated
PS->>PS: reservePayment()
PS->>PT: PaymentReserved {sagaId}
PT->>IS: consume PaymentReserved
IS->>IS: reserveInventory()
IS->>IT: InventoryReserved {sagaId}
Note over OS,IT: Failure Path (compensation)
IS->>IT: InventoryReservationFailed {sagaId}
IT->>PS: consume InventoryReservationFailed
PS->>PS: releasePayment() [idempotent]
PS->>PT: PaymentReleased {sagaId}
Note over OS,IT: sagaId propagated on every event<br/>for cross-service correlation
Choreography vs Orchestration
Selection guide — use this to choose between the two saga variants:
Dimension Choreography Orchestration Service count 2-3 (sweet spot) 3+ with ordering Ordering requirement None or flexible Strict ("B after A") Rollback complexity Simple / few steps Multi-step branching Visibility Low (distributed logs) High (orchestrator state) Coupling Loose (event-driven) Tighter (orchestrator knows all) Debugging Hard (event mesh) Easier (central orchestrator) Initial complexity Low Higher (pays back with complexity) Rule of thumb: Choreography for 2-3 step flows with independent event reactions; orchestration (Temporal SDK or Axon Saga) for branching flows, strict ordering, or complex rollback.
TypeScript Example
// Choreography Saga — TypeScript (kafkajs)
// Source: kafkajs.org; microservices.io/patterns/data/saga.html
// Each service listens for events, performs its local transaction, emits outcome event
import { Kafka } from 'kafkajs';
const kafka = new Kafka({ clientId: 'payment-service', brokers: ['localhost:9092'] });
const consumer = kafka.consumer({ groupId: 'payment-service' });
const producer = kafka.producer();
await consumer.subscribe({ topic: 'order-events' });
await consumer.run({
eachMessage: async ({ message }) => {
const event = JSON.parse(message.value!.toString());
if (event.type === 'OrderCreated') {
const success = await reservePayment(event.orderId, event.amount);
await producer.send({
topic: 'payment-events',
messages: [{ value: JSON.stringify({
type: success ? 'PaymentReserved' : 'PaymentReservationFailed',
orderId: event.orderId,
sagaId: event.sagaId, // propagate for correlation
})}],
});
}
},
});Java Example
// Choreography Saga — Java (Spring Kafka)
// Source: Spring Kafka docs; microservices.io/patterns/data/saga.html
@Service
public class PaymentService {
@KafkaListener(topics = "order-events")
public void onOrderCreated(OrderCreatedEvent event) {
boolean success = reservePayment(event.getOrderId(), event.getAmount());
PaymentEvent response = success
? new PaymentReservedEvent(event.getOrderId(), event.getSagaId())
: new PaymentReservationFailedEvent(event.getOrderId(), event.getSagaId());
kafkaTemplate.send("payment-events", response);
}
@KafkaListener(topics = "saga-compensation-events")
public void onPaymentCompensate(ReleasePaymentCommand cmd) {
// IDEMPOTENT: UPDATE SET status='available' WHERE status='reserved' AND id=?
releasePaymentReservation(cmd.getOrderId());
}
}Anti-pattern: Non-idempotent compensation
Compensation events must be idempotent — the consumer reacting to a compensation event may receive it more than once (Kafka at-least-once delivery, network retries, consumer group rebalance). A non-idempotent compensation action executed twice causes data corruption: a payment reservation released twice may produce a negative balance.
Prefer semantic idempotency:
UPDATE SET status='available' WHERE status='reserved' AND id=?rather than an unconditionalUPDATE SET status='available'. The WHERE clause makes repeated execution a no-op when the row is already in the target state.See Idempotent-Consumer for the full deduplication strategy catalogue.
Lineage Backward
- Domain-Events — choreography sagas are a reactive domain event chain; each saga step is triggered by and emits a domain event
- Idempotent-Consumer — each participating service is an idempotent consumer of the previous service's events; the saga only works correctly if every step is idempotent
Lineage Forward
- Compensating-Transactions — shared failure-handling mechanism for both choreography and orchestration variants
- Orchestration-Saga-Pattern — sibling variant; use when strict ordering, branching logic, or complex rollback is required
Related Concepts
| Pattern | Relationship |
|---|---|
| Orchestration-Saga-Pattern | Sibling saga variant — centralised coordinator instead of event chain; preferred for complex flows |
| Domain-Events | Choreography saga is built from domain events; each saga step publishes and consumes domain events |
| Idempotent-Consumer | Prerequisite — every saga step must be an idempotent consumer; compensation steps doubly so |
| Dead-Letter-Queue | Unprocessable saga events (poison messages) must be routed to DLQ to prevent saga stall |
| CQRS-Pattern | CQRS write side emits domain events that choreography sagas can consume to trigger saga steps |
| Event-Sourcing-Pattern | Event Sourcing appends all state changes as events; choreography sagas add cross-service coordination on top |
Related System Design
- Message-Queue — choreography sagas use pub/sub topics as the event bus; Message-Queue covers the infrastructure topology (fan-out to N consumer groups); Choreography-Saga covers the distributed workflow coordination pattern built on top
- News-Feed-Design — feed update propagation across the fan-out worker fleet follows event-driven choreography; the news feed is a real-world choreography saga where each fan-out step is an event handler
Sources
- microservices.io/patterns/data/saga.html — Chris Richardson, Saga pattern, choreography vs orchestration comparison
- Chris Richardson, Microservices Patterns, 2018 — Saga chapter, compensating transactions, choreography mechanics
- kafkajs.org — kafkajs consumer/producer API
- Spring Kafka docs — @KafkaListener, KafkaTemplate