Structured Logging

Structured Logging

Structured logging emits log entries as machine-parseable records (typically JSON) rather than free-form text strings. Each field is a typed key-value pair rather than an embedded substring. Log aggregators (Elasticsearch, Loki, Splunk) can index structured fields directly, enabling sub-second queries, aggregations, and alerting that are impossible or fragile against unstructured text.

Scope: This note covers logging structure, field schema, and context propagation patterns. For distributed trace implementation (OTel SDK, Micrometer Tracing bridge, span creation), see Distributed-Tracing. For metrics collection and Prometheus integration, see Micrometer.

When NOT to Use

Structured logging is not universally beneficial. Avoid it — or use it with awareness of its costs — in these contexts:

ContextReason
Local scripts and one-off CLIsHuman readability outweighs machine parsability; no log aggregator to benefit
Development console outputPretty-printed, colourised text reduces feedback latency
Extremely high-throughput hot paths (>100k msg/s on single core)JSON serialization adds ~5–15% CPU overhead vs raw string concatenation
Embedded or edge systems with constrained storageJSON's ~3x storage overhead over raw text is prohibitive

Default choice: use structured logging in any service deployed beyond a developer's laptop. The query and alerting benefits overwhelm the marginal serialization cost in all network-deployed contexts.

Mandatory JSON Field Schema

Every log line in a production service MUST contain these eight fields. Absence of any field breaks cross-service correlation and alerting pipelines.

FieldTypeExamplePurpose
timestampISO 8601 with milliseconds2026-03-24T12:00:00.123ZChronological ordering across services (UTC only)
levelenumINFOSeverity routing; must be one of TRACE, DEBUG, INFO, WARN, ERROR
servicestringorder-serviceIdentifies origin service; read from env var, never hardcoded
traceIdstring4bf92f3577b34da6a3ce929d0e0e4736W3C 128-bit trace ID from active span; empty string if no trace context
spanIdstring00f067aa0ba902b7Current span ID; empty string if no trace context
correlationIdstring (UUID v4)550e8400-e29b-41d4-a716-446655440000Application-generated request linkage ID; persists across non-traced hops
messagestringOrder placed successfullyHuman-readable description; must not contain PII
contextobject{ "userId": "usr_abc123", "orderId": "ord_456" }Structured context fields; user identifiers must be internal IDs, never raw PII

Minimal JSON example:

{
  "timestamp": "2026-03-24T12:00:00.123Z",
  "level": "INFO",
  "service": "order-service",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "spanId": "00f067aa0ba902b7",
  "correlationId": "550e8400-e29b-41d4-a716-446655440000",
  "message": "Order placed successfully",
  "context": {
    "userId": "usr_abc123",
    "orderId": "ord_456",
    "totalCents": 4999
  }
}

Correlation ID vs. Trace ID

These two identifiers serve overlapping but distinct purposes. Confusing them leads to broken correlation in partially-instrumented systems.

AttributeTrace IDCorrelation ID
Managed byTracing infrastructure (OTel SDK)Application code
FormatW3C 128-bit hex (32 chars)UUID v4 (application-generated)
Auto-generatedYes — by OTel SDK on first spanYes — by application on request entry
HTTP propagationtraceparent header (W3C standard)X-Correlation-ID header (custom)
Visible inTracing UI (Zipkin, Jaeger, Tempo)Log aggregators (Kibana, Grafana Loki)
Survives non-traced hopsNo — stops where tracing stopsYes — propagates everywhere via custom header
Required tracing agentYesNo

When to use each:

  • Use traceId for distributed trace stitching in APM tools where all services are instrumented with OTel.
  • Use correlationId for log grep/search in Kibana or Loki, especially when tracing is partial, the call chain includes non-HTTP integrations (Kafka consumers, batch jobs), or a request predates OTel instrumentation.
  • In a fully instrumented system, traceId can serve as correlationId — but correlationId provides the fallback for everything outside the OTel boundary. Emit both fields; the cost is negligible.

Pitfall: Do not pass traceId as X-Correlation-ID. The traceparent header carries the trace ID in W3C format; X-Correlation-ID is a separate custom header. A service that reads X-Correlation-ID expecting a UUID will receive malformed data if it instead receives a raw trace ID.

PII Exclusion Rules

Logging PII is a regulatory violation in most jurisdictions. These rules are mandatory, not optional.

Regulatory basis: GDPR Art. 5(1)(f) (integrity and confidentiality), HIPAA 45 CFR §164.312(b) (audit controls), PCI DSS Requirement 3.4 (protect stored cardholder data).

NEVER Log

CategoryExamples
Authentication credentialsPasswords, API keys, session tokens, JWTs
Payment dataCredit card numbers, CVV, bank account numbers
Government identifiersSSNs, passport numbers, national ID numbers
Personal contact dataEmail addresses, phone numbers, physical addresses
Personal identityFull names, date of birth
Network identifiersClient IP addresses (GDPR: personal data), MAC addresses

ALWAYS Log (safe identifiers)

FieldWhy it is safe
Internal user ID (userId, customerId)Opaque identifier; lookup requires separate access-controlled store
Request/operation IDNo personal information
Resource IDs (orderId, productId)Business entity reference, not personal data
Error codes and operation namesDiagnostic, not personal

The pattern: log the identifier that allows an authorised engineer to look up sensitive data in a separate access-controlled store. Never log the sensitive value itself.

Automated PII Masking

TypeScript (pino redact):

const logger = pino({
  redact: {
    paths: ['req.headers.authorization', 'body.password', 'body.email', 'body.creditCard'],
    censor: '[REDACTED]',
  },
});

Java (Logback replace converter in logback.xml):

<conversionRule conversionWord="maskedMsg"
    converterClass="ch.qos.logback.core.pattern.color.ReplaceConverter" />
<!-- In encoder pattern: replace email-like strings -->
<pattern>%replace(%maskedMsg){'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}', '[EMAIL_REDACTED]'}</pattern>

Automated masking is a backstop, not a substitute for disciplined coding. Write code that never populates PII into log fields in the first place.

Async Context Propagation

Correlation ID and trace ID must survive every hop in an async architecture. Three propagation mechanisms cover most cases.

HTTP: X-Correlation-ID Header

On incoming request entry: read X-Correlation-ID header; if absent, generate UUID v4. On every outgoing fan-out call: write the same correlationId to the outgoing X-Correlation-ID header.

Messaging (Kafka / RabbitMQ)

Embed correlationId in message headers at publish time. Extract on the consumer side and inject into MDC (or Reactor Context) before processing begins.

Reactive Pipelines (Project Reactor / WebFlux)

ThreadLocal does not propagate across Project Reactor operator boundaries. Use Reactor Context to carry correlationId. See Distributed-Tracing for the MdcContextLifter + Hooks.onEachOperator pattern that bridges Reactor Context into MDC for structured logging.

Propagation Sequence Diagram

sequenceDiagram
    participant C as Client
    participant A as Service A (entry)
    participant B as Service B
    participant D as Service C

    C->>A: POST /orders (no X-Correlation-ID)
    Note over A: generate correlationId = 550e8400-...
    A->>A: log {correlationId: "550e8400-..."}<br/>message: "Received order request"
    A->>B: POST /inventory<br/>X-Correlation-ID: 550e8400-...
    B->>B: log {correlationId: "550e8400-..."}<br/>message: "Checking stock"
    B->>D: GET /warehouse/stock<br/>X-Correlation-ID: 550e8400-...
    D->>D: log {correlationId: "550e8400-..."}<br/>message: "Stock query executed"
    D-->>B: 200 OK
    B-->>A: 200 OK
    A-->>C: 201 Created<br/>X-Correlation-ID: 550e8400-...

All three services log correlationId: "550e8400-...". A single Kibana/Loki query correlationId: "550e8400-..." returns all log lines for the full request, regardless of which service emitted them.

TypeScript Example — pino 10.3.1

Logger Setup with Redact

import pino from 'pino'; // pino 10.3.1
 
const logger = pino({
  level: process.env.LOG_LEVEL ?? 'info',
  redact: ['req.headers.authorization', 'body.password', 'body.email'],
  formatters: {
    level: (label) => ({ level: label }),
  },
  base: {
    service: process.env.SERVICE_NAME ?? 'unknown-service',
  },
});
 
export function withCorrelation(correlationId: string, traceId?: string) {
  return logger.child({ correlationId, traceId: traceId ?? '' });
}

Correlation Middleware (Express)

import { Request, Response, NextFunction } from 'express';
 
export function correlationMiddleware(
  req: Request,
  res: Response,
  next: NextFunction,
): void {
  // Propagate inbound or generate new
  const correlationId =
    (req.headers['x-correlation-id'] as string) ?? crypto.randomUUID();
 
  // Extract traceId from W3C traceparent header (second segment)
  const traceparent = req.headers['traceparent'] as string | undefined;
  const traceId = traceparent?.split('-')[1];
 
  // Attach child logger with correlation context to the request
  (req as any).log = withCorrelation(correlationId, traceId);
 
  // Propagate outbound so upstream callers can correlate
  res.setHeader('X-Correlation-ID', correlationId);
  next();
}

Usage:

app.use(correlationMiddleware);
 
app.post('/orders', (req: any, res) => {
  req.log.info({ context: { userId: req.user.id } }, 'Order request received');
  // ... business logic
});

Note: pino's redact option operates at serialization time, not at field-assignment time. Always pass sensitive data through redact paths rather than filtering before the log call — the pino serializer is faster and more reliable.

Java Example — SLF4J + Logback + logstash-logback-encoder 8.x

Dependencies (build.gradle)

dependencies {
    implementation 'ch.qos.logback:logback-classic:1.5.6'
    implementation 'net.logstash.logback:logstash-logback-encoder:8.0'
    // SLF4J API is a transitive dependency of logback-classic
}

logback-spring.xml — JSON Encoder

<configuration>
  <appender name="JSON_STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder class="net.logstash.logback.encoder.LogstashEncoder">
      <!-- logstash-logback-encoder 8.x auto-includes MDC fields as top-level JSON -->
      <includeContext>false</includeContext>
      <fieldNames>
        <timestamp>timestamp</timestamp>
        <version>[ignore]</version>
        <levelValue>[ignore]</levelValue>
      </fieldNames>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="JSON_STDOUT" />
  </root>
</configuration>

CorrelationFilter — MDC Propagation

@Component
@Order(Ordered.HIGHEST_PRECEDENCE)
public class CorrelationFilter extends OncePerRequestFilter {
 
    private static final String CORRELATION_HEADER = "X-Correlation-ID";
 
    @Override
    protected void doFilterInternal(
            HttpServletRequest req,
            HttpServletResponse res,
            FilterChain chain) throws ServletException, IOException {
 
        String correlationId = Optional
            .ofNullable(req.getHeader(CORRELATION_HEADER))
            .orElse(UUID.randomUUID().toString());
 
        MDC.put("correlationId", correlationId);
        res.setHeader(CORRELATION_HEADER, correlationId);
 
        try {
            chain.doFilter(req, res);
        } finally {
            // MANDATORY: thread-pool reuse causes correlationId bleed without this
            MDC.remove("correlationId");
        }
    }
}

Pitfall: MDC.remove() in the finally block is mandatory. Spring Boot uses a thread pool for servlet requests. Without removal, a thread servicing request B will inherit request A's correlationId from the pool. This is a silent data integrity bug — logs appear correlated when they are not.

WebFlux note: MDC does not propagate in Project Reactor's non-blocking scheduler. For reactive services, use the MdcContextLifter + Hooks.onEachOperator pattern described in Distributed-Tracing.

Logging with context in a service:

@Service
@Slf4j
public class OrderService {
    public Order placeOrder(PlaceOrderCommand cmd) {
        log.info("Order placement initiated",
            StructuredArguments.keyValue("userId", cmd.userId()),   // internal ID only
            StructuredArguments.keyValue("productId", cmd.productId()),
            StructuredArguments.keyValue("quantityCents", cmd.totalCents()));
        // ...
    }
}

StructuredArguments (from logstash-logback-encoder) promotes key-value pairs to top-level JSON fields rather than embedding them in the message string.

Log Level Selection

LevelEnable in ProdWhen to UseExample
TRACENeverInternal algorithm steps; enable per-session for debugging onlyLoop iteration counters, sub-expression values
DEBUGNoFunction entry/exit with arguments, intermediate state"Entering getUserById with id=42"
INFOYesBusiness events, state transitions"Order placed", "User authenticated", "Payment processed"
WARNYesRecoverable error, degraded-but-functional state"Retry succeeded on attempt 3", "Cache miss; falling back to DB"
ERRORYesUnrecoverable error requiring investigationException with full stack trace; always include error.message and error.stack

Rule: ERROR logs must always include a stack trace and all context required to reproduce the failure. An ERROR log without a stack trace is incomplete.

Log Pipeline Diagram

flowchart LR
    A[Application\nNode.js / JVM] -->|JSON to stdout| B[Log Serializer\npino / Logback\n+logstash-logback-encoder]
    B -->|JSON lines| C[Log Shipper\nFilebeat / Fluentd]
    C -->|bulk ingest| D[Log Backend\nElasticsearch / Loki]
    D -->|query by correlationId| E[Dashboard\nKibana / Grafana]

    style A fill:#f0f4ff
    style E fill:#f0f4ff

The correlationId field is the key that enables cross-service search: a single query correlationId:"550e8400-..." in Kibana or {correlationId="550e8400-..."} in Loki returns all log lines for that request from every service in the pipeline.

Log shippers (Filebeat/Fluentd) collect JSON lines from stdout/files and batch-ship to the backend. Log backends store and index structured fields; query latency for indexed fields is sub-second at billion-document scale.

Tradeoffs

DimensionStructured JSONUnstructured TextNotes
Throughput overhead~5–15% CPU (serialization)Baselinepino is the fastest Node.js JSON logger; overhead vs console.log varies by payload size
Storage cost~2–3x vs raw textBaselineJSON field names are repeated on every line; compensated by compression (gzip ~4:1)
Query powerSub-second field aggregationsFull-text scan (slow)Indexed fields enable level:ERROR AND service:order-service in milliseconds
Alerting reliabilityHigh — field match is exactLow — regex on free text breaks on format changesAlert on level == "ERROR" is robust; alert on /ERROR/ breaks if format changes
Cold-path parse costNone — already structured at write timeHigh — parse at query timeQuery-time parsing is 10–100x slower and blocks the aggregation pipeline
Developer ergonomicsLower (no interpolation syntax)Higher (string templates)pino's child logger and StructuredArguments restore ergonomics
  • Distributed-Tracing — trace ID propagation and OTel SDK implementation; MdcContextLifter for reactive pipelines
  • Micrometer — Java metrics and Spring Boot observability integration; Prometheus scrape config
  • SLO-SLI-SLA — (upcoming) log-based SLI measurement using structured log fields as event sources
  • Alerting-Strategies — (upcoming) log-based alerting patterns (error rate, latency percentile alerts via log queries)