Structured Logging
Structured Logging
Structured logging emits log entries as machine-parseable records (typically JSON) rather than free-form text strings. Each field is a typed key-value pair rather than an embedded substring. Log aggregators (Elasticsearch, Loki, Splunk) can index structured fields directly, enabling sub-second queries, aggregations, and alerting that are impossible or fragile against unstructured text.
Scope: This note covers logging structure, field schema, and context propagation patterns. For distributed trace implementation (OTel SDK, Micrometer Tracing bridge, span creation), see Distributed-Tracing. For metrics collection and Prometheus integration, see Micrometer.
When NOT to Use
Structured logging is not universally beneficial. Avoid it — or use it with awareness of its costs — in these contexts:
| Context | Reason |
|---|---|
| Local scripts and one-off CLIs | Human readability outweighs machine parsability; no log aggregator to benefit |
| Development console output | Pretty-printed, colourised text reduces feedback latency |
| Extremely high-throughput hot paths (>100k msg/s on single core) | JSON serialization adds ~5–15% CPU overhead vs raw string concatenation |
| Embedded or edge systems with constrained storage | JSON's ~3x storage overhead over raw text is prohibitive |
Default choice: use structured logging in any service deployed beyond a developer's laptop. The query and alerting benefits overwhelm the marginal serialization cost in all network-deployed contexts.
Mandatory JSON Field Schema
Every log line in a production service MUST contain these eight fields. Absence of any field breaks cross-service correlation and alerting pipelines.
| Field | Type | Example | Purpose |
|---|---|---|---|
timestamp | ISO 8601 with milliseconds | 2026-03-24T12:00:00.123Z | Chronological ordering across services (UTC only) |
level | enum | INFO | Severity routing; must be one of TRACE, DEBUG, INFO, WARN, ERROR |
service | string | order-service | Identifies origin service; read from env var, never hardcoded |
traceId | string | 4bf92f3577b34da6a3ce929d0e0e4736 | W3C 128-bit trace ID from active span; empty string if no trace context |
spanId | string | 00f067aa0ba902b7 | Current span ID; empty string if no trace context |
correlationId | string (UUID v4) | 550e8400-e29b-41d4-a716-446655440000 | Application-generated request linkage ID; persists across non-traced hops |
message | string | Order placed successfully | Human-readable description; must not contain PII |
context | object | { "userId": "usr_abc123", "orderId": "ord_456" } | Structured context fields; user identifiers must be internal IDs, never raw PII |
Minimal JSON example:
{
"timestamp": "2026-03-24T12:00:00.123Z",
"level": "INFO",
"service": "order-service",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7",
"correlationId": "550e8400-e29b-41d4-a716-446655440000",
"message": "Order placed successfully",
"context": {
"userId": "usr_abc123",
"orderId": "ord_456",
"totalCents": 4999
}
}Correlation ID vs. Trace ID
These two identifiers serve overlapping but distinct purposes. Confusing them leads to broken correlation in partially-instrumented systems.
| Attribute | Trace ID | Correlation ID |
|---|---|---|
| Managed by | Tracing infrastructure (OTel SDK) | Application code |
| Format | W3C 128-bit hex (32 chars) | UUID v4 (application-generated) |
| Auto-generated | Yes — by OTel SDK on first span | Yes — by application on request entry |
| HTTP propagation | traceparent header (W3C standard) | X-Correlation-ID header (custom) |
| Visible in | Tracing UI (Zipkin, Jaeger, Tempo) | Log aggregators (Kibana, Grafana Loki) |
| Survives non-traced hops | No — stops where tracing stops | Yes — propagates everywhere via custom header |
| Required tracing agent | Yes | No |
When to use each:
- Use
traceIdfor distributed trace stitching in APM tools where all services are instrumented with OTel. - Use
correlationIdfor log grep/search in Kibana or Loki, especially when tracing is partial, the call chain includes non-HTTP integrations (Kafka consumers, batch jobs), or a request predates OTel instrumentation. - In a fully instrumented system,
traceIdcan serve ascorrelationId— butcorrelationIdprovides the fallback for everything outside the OTel boundary. Emit both fields; the cost is negligible.
Pitfall: Do not pass
traceIdasX-Correlation-ID. Thetraceparentheader carries the trace ID in W3C format;X-Correlation-IDis a separate custom header. A service that readsX-Correlation-IDexpecting a UUID will receive malformed data if it instead receives a raw trace ID.
PII Exclusion Rules
Logging PII is a regulatory violation in most jurisdictions. These rules are mandatory, not optional.
Regulatory basis: GDPR Art. 5(1)(f) (integrity and confidentiality), HIPAA 45 CFR §164.312(b) (audit controls), PCI DSS Requirement 3.4 (protect stored cardholder data).
NEVER Log
| Category | Examples |
|---|---|
| Authentication credentials | Passwords, API keys, session tokens, JWTs |
| Payment data | Credit card numbers, CVV, bank account numbers |
| Government identifiers | SSNs, passport numbers, national ID numbers |
| Personal contact data | Email addresses, phone numbers, physical addresses |
| Personal identity | Full names, date of birth |
| Network identifiers | Client IP addresses (GDPR: personal data), MAC addresses |
ALWAYS Log (safe identifiers)
| Field | Why it is safe |
|---|---|
Internal user ID (userId, customerId) | Opaque identifier; lookup requires separate access-controlled store |
| Request/operation ID | No personal information |
Resource IDs (orderId, productId) | Business entity reference, not personal data |
| Error codes and operation names | Diagnostic, not personal |
The pattern: log the identifier that allows an authorised engineer to look up sensitive data in a separate access-controlled store. Never log the sensitive value itself.
Automated PII Masking
TypeScript (pino redact):
const logger = pino({
redact: {
paths: ['req.headers.authorization', 'body.password', 'body.email', 'body.creditCard'],
censor: '[REDACTED]',
},
});Java (Logback replace converter in logback.xml):
<conversionRule conversionWord="maskedMsg"
converterClass="ch.qos.logback.core.pattern.color.ReplaceConverter" />
<!-- In encoder pattern: replace email-like strings -->
<pattern>%replace(%maskedMsg){'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}', '[EMAIL_REDACTED]'}</pattern>Automated masking is a backstop, not a substitute for disciplined coding. Write code that never populates PII into log fields in the first place.
Async Context Propagation
Correlation ID and trace ID must survive every hop in an async architecture. Three propagation mechanisms cover most cases.
HTTP: X-Correlation-ID Header
On incoming request entry: read X-Correlation-ID header; if absent, generate UUID v4. On every outgoing fan-out call: write the same correlationId to the outgoing X-Correlation-ID header.
Messaging (Kafka / RabbitMQ)
Embed correlationId in message headers at publish time. Extract on the consumer side and inject into MDC (or Reactor Context) before processing begins.
Reactive Pipelines (Project Reactor / WebFlux)
ThreadLocal does not propagate across Project Reactor operator boundaries. Use Reactor Context to carry correlationId. See Distributed-Tracing for the MdcContextLifter + Hooks.onEachOperator pattern that bridges Reactor Context into MDC for structured logging.
Propagation Sequence Diagram
sequenceDiagram
participant C as Client
participant A as Service A (entry)
participant B as Service B
participant D as Service C
C->>A: POST /orders (no X-Correlation-ID)
Note over A: generate correlationId = 550e8400-...
A->>A: log {correlationId: "550e8400-..."}<br/>message: "Received order request"
A->>B: POST /inventory<br/>X-Correlation-ID: 550e8400-...
B->>B: log {correlationId: "550e8400-..."}<br/>message: "Checking stock"
B->>D: GET /warehouse/stock<br/>X-Correlation-ID: 550e8400-...
D->>D: log {correlationId: "550e8400-..."}<br/>message: "Stock query executed"
D-->>B: 200 OK
B-->>A: 200 OK
A-->>C: 201 Created<br/>X-Correlation-ID: 550e8400-...
All three services log correlationId: "550e8400-...". A single Kibana/Loki query correlationId: "550e8400-..." returns all log lines for the full request, regardless of which service emitted them.
TypeScript Example — pino 10.3.1
Logger Setup with Redact
import pino from 'pino'; // pino 10.3.1
const logger = pino({
level: process.env.LOG_LEVEL ?? 'info',
redact: ['req.headers.authorization', 'body.password', 'body.email'],
formatters: {
level: (label) => ({ level: label }),
},
base: {
service: process.env.SERVICE_NAME ?? 'unknown-service',
},
});
export function withCorrelation(correlationId: string, traceId?: string) {
return logger.child({ correlationId, traceId: traceId ?? '' });
}Correlation Middleware (Express)
import { Request, Response, NextFunction } from 'express';
export function correlationMiddleware(
req: Request,
res: Response,
next: NextFunction,
): void {
// Propagate inbound or generate new
const correlationId =
(req.headers['x-correlation-id'] as string) ?? crypto.randomUUID();
// Extract traceId from W3C traceparent header (second segment)
const traceparent = req.headers['traceparent'] as string | undefined;
const traceId = traceparent?.split('-')[1];
// Attach child logger with correlation context to the request
(req as any).log = withCorrelation(correlationId, traceId);
// Propagate outbound so upstream callers can correlate
res.setHeader('X-Correlation-ID', correlationId);
next();
}Usage:
app.use(correlationMiddleware);
app.post('/orders', (req: any, res) => {
req.log.info({ context: { userId: req.user.id } }, 'Order request received');
// ... business logic
});Note: pino's
redactoption operates at serialization time, not at field-assignment time. Always pass sensitive data throughredactpaths rather than filtering before the log call — the pino serializer is faster and more reliable.
Java Example — SLF4J + Logback + logstash-logback-encoder 8.x
Dependencies (build.gradle)
dependencies {
implementation 'ch.qos.logback:logback-classic:1.5.6'
implementation 'net.logstash.logback:logstash-logback-encoder:8.0'
// SLF4J API is a transitive dependency of logback-classic
}logback-spring.xml — JSON Encoder
<configuration>
<appender name="JSON_STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<!-- logstash-logback-encoder 8.x auto-includes MDC fields as top-level JSON -->
<includeContext>false</includeContext>
<fieldNames>
<timestamp>timestamp</timestamp>
<version>[ignore]</version>
<levelValue>[ignore]</levelValue>
</fieldNames>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="JSON_STDOUT" />
</root>
</configuration>CorrelationFilter — MDC Propagation
@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class CorrelationFilter extends OncePerRequestFilter {
private static final String CORRELATION_HEADER = "X-Correlation-ID";
@Override
protected void doFilterInternal(
HttpServletRequest req,
HttpServletResponse res,
FilterChain chain) throws ServletException, IOException {
String correlationId = Optional
.ofNullable(req.getHeader(CORRELATION_HEADER))
.orElse(UUID.randomUUID().toString());
MDC.put("correlationId", correlationId);
res.setHeader(CORRELATION_HEADER, correlationId);
try {
chain.doFilter(req, res);
} finally {
// MANDATORY: thread-pool reuse causes correlationId bleed without this
MDC.remove("correlationId");
}
}
}Pitfall:
MDC.remove()in thefinallyblock is mandatory. Spring Boot uses a thread pool for servlet requests. Without removal, a thread servicing request B will inherit request A'scorrelationIdfrom the pool. This is a silent data integrity bug — logs appear correlated when they are not.
WebFlux note: MDC does not propagate in Project Reactor's non-blocking scheduler. For reactive services, use the
MdcContextLifter+Hooks.onEachOperatorpattern described in Distributed-Tracing.
Logging with context in a service:
@Service
@Slf4j
public class OrderService {
public Order placeOrder(PlaceOrderCommand cmd) {
log.info("Order placement initiated",
StructuredArguments.keyValue("userId", cmd.userId()), // internal ID only
StructuredArguments.keyValue("productId", cmd.productId()),
StructuredArguments.keyValue("quantityCents", cmd.totalCents()));
// ...
}
}StructuredArguments (from logstash-logback-encoder) promotes key-value pairs to top-level JSON fields rather than embedding them in the message string.
Log Level Selection
| Level | Enable in Prod | When to Use | Example |
|---|---|---|---|
TRACE | Never | Internal algorithm steps; enable per-session for debugging only | Loop iteration counters, sub-expression values |
DEBUG | No | Function entry/exit with arguments, intermediate state | "Entering getUserById with id=42" |
INFO | Yes | Business events, state transitions | "Order placed", "User authenticated", "Payment processed" |
WARN | Yes | Recoverable error, degraded-but-functional state | "Retry succeeded on attempt 3", "Cache miss; falling back to DB" |
ERROR | Yes | Unrecoverable error requiring investigation | Exception with full stack trace; always include error.message and error.stack |
Rule: ERROR logs must always include a stack trace and all context required to reproduce the failure. An ERROR log without a stack trace is incomplete.
Log Pipeline Diagram
flowchart LR
A[Application\nNode.js / JVM] -->|JSON to stdout| B[Log Serializer\npino / Logback\n+logstash-logback-encoder]
B -->|JSON lines| C[Log Shipper\nFilebeat / Fluentd]
C -->|bulk ingest| D[Log Backend\nElasticsearch / Loki]
D -->|query by correlationId| E[Dashboard\nKibana / Grafana]
style A fill:#f0f4ff
style E fill:#f0f4ff
The correlationId field is the key that enables cross-service search: a single query correlationId:"550e8400-..." in Kibana or {correlationId="550e8400-..."} in Loki returns all log lines for that request from every service in the pipeline.
Log shippers (Filebeat/Fluentd) collect JSON lines from stdout/files and batch-ship to the backend. Log backends store and index structured fields; query latency for indexed fields is sub-second at billion-document scale.
Tradeoffs
| Dimension | Structured JSON | Unstructured Text | Notes |
|---|---|---|---|
| Throughput overhead | ~5–15% CPU (serialization) | Baseline | pino is the fastest Node.js JSON logger; overhead vs console.log varies by payload size |
| Storage cost | ~2–3x vs raw text | Baseline | JSON field names are repeated on every line; compensated by compression (gzip ~4:1) |
| Query power | Sub-second field aggregations | Full-text scan (slow) | Indexed fields enable level:ERROR AND service:order-service in milliseconds |
| Alerting reliability | High — field match is exact | Low — regex on free text breaks on format changes | Alert on level == "ERROR" is robust; alert on /ERROR/ breaks if format changes |
| Cold-path parse cost | None — already structured at write time | High — parse at query time | Query-time parsing is 10–100x slower and blocks the aggregation pipeline |
| Developer ergonomics | Lower (no interpolation syntax) | Higher (string templates) | pino's child logger and StructuredArguments restore ergonomics |
Related Concepts
- Distributed-Tracing — trace ID propagation and OTel SDK implementation; MdcContextLifter for reactive pipelines
- Micrometer — Java metrics and Spring Boot observability integration; Prometheus scrape config
- SLO-SLI-SLA — (upcoming) log-based SLI measurement using structured log fields as event sources
- Alerting-Strategies — (upcoming) log-based alerting patterns (error rate, latency percentile alerts via log queries)