Distributed Tracing Patterns
Distributed Tracing Patterns
Distributed-Tracing covers the implementation layer: span structure, W3C traceparent, Micrometer Tracing bridge, and Spring Boot auto-configuration. This note covers the pattern layer: sampling strategy selection criteria, async context propagation failure modes, and the decisions that determine whether tracing delivers value or just cost.
This note covers: sampling strategy selection criteria, tail-based sampling trade-offs, context budget constraints, and async context propagation failure modes with prevention patterns. Out of scope (covered in Distributed-Tracing): trace ID / span ID data model, W3C traceparent header format, OTel SDK setup, Micrometer Tracing bridge configuration, Spring Boot auto-configuration, reactive pipeline Reactor Context propagation, and BFF-specific traceparent forwarding.
When NOT to Use Distributed Tracing
Distributed tracing is frequently cargo-culted into systems where it provides minimal value relative to its operational cost. Three cases where it is not warranted:
Single-process monoliths with fewer than 5 external calls per request. Structured logging with correlation IDs provides complete request visibility without trace backend overhead. The cross-service visibility that justifies tracing does not apply when there are no service boundaries to cross. See Structured-Logging for the correlation ID propagation pattern.
High-frequency, low-latency pipelines above 100k req/s. At 100% sampling, 1 trace averages 500 bytes to 2 KB. At 100k req/s that is 43–172 GB per hour of trace data. A conservative backend cost estimate at 1% sampling exceeds $5,000/month for high-traffic services. Tail-based or probabilistic sampling is required, adding collector infrastructure complexity. Do this cost analysis before adopting tracing in any pipeline above 10k req/s.
When the team cannot maintain a trace backend. A trace collector receiving data it cannot drain causes out-of-memory failures. Tracing infrastructure without query capability (no one opening Jaeger or Tempo) produces noise, not signal. Structured logging with a log query tool (Loki, CloudWatch Insights, Datadog Logs) is almost always the better investment for a team without a dedicated observability practice.
Sampling Strategies
The three strategies differ in when the sampling decision is made and what information is available at decision time.
Head-Based Sampling
The sampling decision is made at root span creation — before the request executes. Every child span in the trace inherits the parent's decision: if the root is sampled, all descendants are sampled; if not sampled, none are.
Advantages:
- Zero overhead for unsampled requests — no spans created, no memory used
- Simple to reason about: the decision is deterministic from the trace ID
Disadvantages:
- The decision is blind to request outcome — you cannot head-sample based on whether the request will return an error or exceed a latency threshold
- High-value events (rare errors, latency spikes) may fall into the unsampled fraction
Use when: Cost constraint is the primary driver; most requests are statistically representative; missing occasional tail-latency cases is acceptable; no error-capture SLA.
Typical rate: 1–10% in production for high-traffic services. See Distributed-Tracing for OTel SDK TraceIdRatioBased sampler configuration.
Tail-Based Sampling
The decision is deferred until the full trace completes — all spans have been received by the collector. The collector can then decide based on the actual outcome: was there an error span? Did any span exceed the latency threshold?
Advantages:
- Always captures traces that contain ERROR-status spans, regardless of overall sample rate
- Always captures traces that exceed latency thresholds (e.g., p99 budget)
- Delivers the highest-value sample set: errors + slow traces + a configured % of healthy traces
Disadvantages:
- All spans for an in-flight trace must be buffered in collector memory until the trace is complete — significant memory pressure for high-traffic services
- Long-running traces (minutes, hours) extend the buffer window substantially
- Tail-based sampling logic lives in the OTel Collector configuration, not in application code — adds infrastructure ownership responsibility
Use when: Debugging production error cases is the primary goal; SLO violation investigation requires complete traces; "always sample errors, sample 1% of successes" is the target policy.
Rule of thumb:
- Always sample any trace containing an ERROR-status span
- Always sample any trace where any span duration exceeded the p99 latency budget
- Sample 1–5% of fully healthy traces for baseline coverage
Implementation note: OTel Collector tail sampling processor implements this. Policy is Collector YAML configuration — not application code. See Distributed-Tracing for Collector setup patterns.
Probabilistic (Rate-Limiting) Sampling
A consistent probability applied uniformly across all requests, making the sampling decision at the head with a fixed rate. Two variants exist:
Fixed-rate probabilistic: 10% = exactly 1 in 10 traces sampled, chosen by hash of trace ID for consistency across services in the same trace.
Rate-limiting variant: Cap at a fixed number of traces per second regardless of traffic volume (e.g., 100 traces/sec). At low traffic the effective rate may be 100%; at high traffic the rate drops to keep within the cap.
Use when: Even coverage across all operations matters more than error capture; traffic volume is moderate (under 10k req/s); the team wants a simple "set and forget" policy without collector tail-buffering infrastructure.
Sampling Decision Flowchart
flowchart TD
A[New service — choose sampling strategy] --> B{Error capture required?\nMust not miss errors in traces}
B -- Yes --> C{Can afford collector\nbuffering overhead?\nMemory + infra ownership}
C -- Yes --> D[Tail-Based Sampling\nPolicy: always ERROR, sample N% of OK]
C -- No --> E[Head-Based low rate\n+ alert on error rate metrics\nAccept missing error traces]
B -- No --> F{Traffic above 10k req/s?}
F -- Yes --> G{Even coverage needed\nor cost cap?}
G -- Cost cap --> H[Rate-Limiting: cap at\nfixed traces per second]
G -- Even coverage --> I[Probabilistic: 1-5%\nof all requests]
F -- No --> J{Need representative\ncoverage of all operations?}
J -- Yes --> K[Probabilistic: 1-10%\nDefault for most services]
J -- No --> L[Head-Based: 10% default\nSimplest operational footprint]
Sampling Rate Tuning
Choosing the right rate requires estimating storage cost before committing to a strategy.
Starting points by traffic volume:
| Traffic | Recommended Start | Rationale |
|---|---|---|
| < 1k req/s | 10% | Low volume; representative sample; low cost |
| 1k–10k req/s | 5% | Moderate volume; manageable storage |
| 10k–100k req/s | 1% | Cost constraint applies; consider rate-limiting |
| > 100k req/s | < 1% or rate-limit | Tail-based or fixed cap required |
Absolute floor: Never drop below 0.1%. You need at least 1 sample per minute for pattern detection and anomaly baselines. At 0.1% of 1k req/s, that is 1 trace/sec — still sufficient.
Storage cost estimation:
- 1 trace ≈ 500 bytes to 2 KB (depends on span count and attribute density)
- At 1% of 10k req/s = 100 traces/sec × 1 KB average = 100 KB/sec = ~8.6 GB/day
- At 1% of 100k req/s = ~86 GB/day — cost analysis mandatory before deployment
Dynamic sampling: When SLO burn rate spikes (burn rate alerting covered in SLO-SLI-SLA), temporarily increasing the sample rate gives higher trace density for incident investigation. Vendor-specific dynamic sampling implementation is out of scope here.
Sampling Strategy Comparison
| Strategy | Decision Point | Memory Overhead | Best For | Captures Errors By Default? |
|---|---|---|---|---|
| Head-Based | Root span creation | None | Cost-constrained; representative workloads | No — blind to outcome |
| Tail-Based | After trace completes | High (buffer all spans) | Error investigation; SLO violation analysis | Yes — selects by outcome |
| Probabilistic | Root span creation | None | Even coverage; moderate traffic | No — statistical only |
| Rate-Limiting | Root span creation | None | Bursty traffic; fixed cost ceiling | No — statistical only |
Async Context Propagation Failure Modes
Context propagation is the mechanism by which a trace ID and span ID travel across async boundaries — thread pools, message queues, fire-and-forget calls, reactive pipelines, and HTTP clients. When propagation breaks, spans appear as disconnected root spans in the trace backend, making the trace useless for root cause analysis.
The five failure modes below are the most common causes of broken traces in production systems.
Failure Mode 1: Thread Pool Hop Without Context Propagation
What happens: A task is submitted to a thread pool (or scheduled with setTimeout in Node.js) without capturing the active span context. The child task has no parent span reference and starts a new root span.
Symptom: Spans from the async task appear as disconnected root spans in the trace UI. The trace looks incomplete — work was done but is invisible in the main trace.
Prevention: Capture the active context before the async boundary and restore it inside the callback.
import { context } from '@opentelemetry/api'; // @opentelemetry/sdk-node 0.213.0
// BAD: context is lost across the async boundary
setTimeout(() => doWork(), 100);
// GOOD: active context captured before the boundary, restored inside
const captured = context.active();
setTimeout(() => context.with(captured, () => doWork()), 100);The same pattern applies to Promise.all, queueMicrotask, and Worker threads. For Java, use Context.current().wrap(runnable) from the OTel Java SDK to wrap Runnable or Callable instances submitted to ExecutorService.
Failure Mode 2: Message Queue Context Loss
What happens: A producer creates a span for the publish operation but does not inject the trace context into the message headers. The consumer starts a new root span with no link to the producer's trace.
Symptom: The publish trace and the consume trace appear as separate, unrelated traces in the trace backend. You cannot follow a request from API call through to message processing in a single trace view.
Prevention: Inject the W3C traceparent header into message headers on the producer side; extract it on the consumer side before creating the child span.
Pattern (pseudocode — applies to Kafka, RabbitMQ, SQS):
// Producer: inject context into message headers
const headers: Record<string, string> = {};
propagation.inject(context.active(), headers);
await producer.send({ topic, messages: [{ value, headers }] });
// Consumer: extract context from headers, create child span
const parentContext = propagation.extract(context.active(), message.headers);
context.with(parentContext, () => {
const span = tracer.startSpan('message.process');
// ... process message ...
span.end();
});Failure Mode 3: Fire-and-Forget Without Span Link
What happens: A parent request triggers an async side-effect (email send, audit log write, webhook delivery) and returns a response before the side-effect completes. The side-effect is launched without any trace context.
Symptom: Side-effect failures are invisible in the parent request's trace. When the email send fails, there is no way to find the originating request in the trace backend.
Prevention: Use span links (not parent-child relationship) for causally related but asynchronously scoped work. A span link records the causal relationship without imposing the parent-child latency semantics.
import { trace } from '@opentelemetry/api';
// Capture the link before starting the async work
const parentSpan = trace.getActiveSpan();
const links = parentSpan ? [{ context: parentSpan.spanContext() }] : [];
// Side-effect gets a link, not a parent — it does not extend the parent trace
const emailSpan = tracer.startSpan('async.email.send', { links });
emailSpan.setAttribute('email.recipient_count', recipients.length);
// ... perform async work ...
emailSpan.end();Failure Mode 4: Reactive Pipeline Context Loss (Java)
What happens: A flatMap or switchMap operator creates a new reactive chain that does not carry the parent span context from the Reactor Context. Manual Reactor operators that do not propagate Reactor Context silently drop the trace context.
Symptom: Child spans created inside reactive operators appear as root spans. The flatMap work is invisible in the parent request's trace.
Prevention: Micrometer micrometer-tracing-bridge-otel handles this automatically for Spring WebClient, @Observed, and WebFlux request handling. The failure occurs in manually-written Reactor operators that create new spans without reading from the Reactor Context.
For manual operators: read the active span from ContextView before the reactive boundary, restore it using Context.of(...) in the downstream chain. See Distributed-Tracing for the MdcContextLifter pattern, which applies the same principle to MDC propagation.
This failure mode is specific to reactive Java. Node.js does not use a comparable reactive model — async/await with context.with() is sufficient (see Failure Mode 1).
Failure Mode 5: Uninstrumented HTTP Client
What happens: An HTTP call is made via a client that does not inject the W3C traceparent header into the outgoing request. The downstream service receives no trace context and starts a new root span.
Symptom: The downstream service's spans appear as a separate trace. In a multi-hop call chain (A → B → C), the trace fragments at the first uninstrumented hop.
Prevention: Use OTel-instrumented HTTP clients exclusively.
| Runtime | Instrumented client | How |
|---|---|---|
| Node.js | http / https (native) | @opentelemetry/instrumentation-http auto-instruments |
| Node.js | fetch (Node 18+) | @opentelemetry/instrumentation-undici |
| Java | RestTemplate, WebClient | OTel Java agent auto-instruments; micrometer-tracing-bridge-otel instruments WebClient |
| Java | Raw HttpURLConnection | OTel Java agent auto-instruments |
Verification: Check outgoing requests for the traceparent header in browser dev tools (for client-side calls) or in service access logs. If the header is absent on an outgoing HTTP call, the client is not instrumented.
Context Budget
Every serialized span context added to HTTP headers, Kafka records, or gRPC metadata costs bytes on every hop downstream.
- W3C
traceparentheader: ~55 bytes - W3C
tracestateheader: variable, typically 0–200 bytes - W3C
baggageheader: each key-value pair propagated to ALL downstream services in the call chain
Baggage is the footgun. Baggage key-value pairs travel in every request header to every downstream service for the lifetime of the trace. A single large baggage value can add kilobytes to every HTTP request in a fan-out pattern.
Rules:
- Limit baggage to 3–5 fields maximum
- Never put user PII, session tokens, or large payloads in baggage
- Never put data in baggage that only one downstream service needs — use span attributes instead (attributes stay local to the span, not propagated)
TypeScript SDK Configuration Example
Configuring sampling strategy in the OTel SDK (@opentelemetry/sdk-node 0.213.0, requires Node.js >= 18.19, TypeScript >= 5.0.4):
import { NodeSDK } from '@opentelemetry/sdk-node';
import { TraceIdRatioBased, ParentBasedSampler } from '@opentelemetry/sdk-trace-base';
// Head-based: 10% of new traces, respects parent sampling decision for child services
const sdk = new NodeSDK({
sampler: new ParentBasedSampler({
root: new TraceIdRatioBased(0.1), // 10% of root spans
// remoteParentSampled: AlwaysOnSampler (default) — respect upstream decision to sample
// remoteParentNotSampled: AlwaysOffSampler (default) — respect upstream decision to drop
}),
});
sdk.start();
// Rate-limiting variant: use RateLimitingSampler from @opentelemetry/sdk-trace-base
// import { RateLimitingSampler } from '@opentelemetry/sdk-trace-base';
// sampler: new RateLimitingSampler(100) // max 100 traces per second
// Tail-based sampling: NOT configured in the SDK
// Tail-based runs in the OTel Collector (collector.yaml tail_sampling processor)
// See [[Distributed-Tracing]] for Collector setup patternsThe ParentBasedSampler wrapper is important in multi-service architectures: it ensures that a downstream service respects the sampling decision made by the upstream service that created the trace root. Without ParentBasedSampler, each service independently samples, causing traces to appear fragmented (upstream sampled, downstream did not, or vice versa).
Related Concepts
- Distributed-Tracing — OTel SDK implementation, Micrometer Tracing bridge, W3C traceparent format, Spring Boot auto-configuration, reactive pipeline context via Reactor Context
- Structured-Logging — correlation ID propagation; complement to trace context for log-level request linkage
- Metrics-and-Dashboards — RED method latency percentiles; spans feed histogram metrics for SLO measurement
- SLO-SLI-SLA — trace latency data as SLI input for error budget and burn rate alerting