Input Validation

Input Validation

Input validation is the practice of asserting that all data entering a system conforms to an expected schema, type, format, and range before it is processed, stored, or forwarded — preventing injection attacks by ensuring attacker-controlled data cannot be interpreted as executable code, query syntax, or command sequences by downstream processors.


Intent

Input validation is the primary defense against the injection class of vulnerabilities (OWASP Top 10 A03:2021). The goal is to ensure that data conforms to an expected contract before it is processed. Validation does not eliminate injection risks on its own — it reduces the attack surface that parameterized queries, output encoding, and other mitigations then fully close — making it defense-in-depth rather than a standalone cure.


When NOT to Use

  • Do not substitute input validation for parameterized queries. Validation reduces the attack surface but parameterized queries (prepared statements) are the complete defense against SQL injection. Validation is defense-in-depth, not a replacement for parameterization.
  • Do not validate input only at the API entry point. The trust boundary principle requires validation at every boundary where data crosses from untrusted to trusted context — microservice A must validate data received from microservice B, even inside the same network. Validating only at the outer edge is the root cause of second-order injection attacks.
  • Do not use denylist (blocklist) validation as a primary strategy. Denylists require enumerating all malicious patterns — incomplete lists are routinely bypassed via encoding variations, Unicode normalization, or novel attack vectors. Use allowlists as the default; add denylists only for known-bad patterns as an extra layer on top.
  • Do not conflate input validation with output encoding. Validation happens at ingress (before processing); output encoding happens at egress (before rendering). Both are required; neither replaces the other.

Allowlist vs Denylist

Allowlist (positive validation): Define the exact set of permitted values, patterns, or types. Reject anything that does not match. Example: accept only alphanumeric characters for a username field. Allowlist is the correct default — it fails closed. Any input that does not conform to the defined contract is rejected.

Denylist (negative validation): Define the set of forbidden values or patterns. Reject anything that matches. Example: block <script> tags in a text field. Denylist fails open — any pattern not on the list passes through. Denylist is appropriate only as a secondary defense for well-known attack signatures, applied on top of an allowlist layer.

Schema validation: Define the full data contract (field names, types, required/optional, min/max length, format constraints) and reject any input that violates it. JSON Schema, OpenAPI, or library-specific schema languages (zod, Joi, yup) provide schema validation. Schema validation is the highest-leverage allowlist pattern for structured data inputs — it encodes the entire expected contract in a single, reviewable definition.

Validation Decision Flow

flowchart TD
    A[Input received] --> B{Schema valid?\ntype, required fields, length}
    B -- No --> Z[Reject: 400 Bad Request]
    B -- Yes --> C{Allowlist check\npattern / enum match}
    C -- No --> Z
    C -- Yes --> D{Denylist check\nknown-bad patterns}
    D -- Match --> Z
    D -- No match --> E{Trust boundary?\nIs data crossing to another service?}
    E -- Yes --> F[Re-validate at receiving boundary]
    E -- No --> G[Accept: proceed to business logic]
    F --> B

Trust Boundary Principle

A trust boundary is any point where data crosses from a context with lower trust to a context with higher trust — or where data crosses between independent trust domains. Validation must occur at every trust boundary, not just at the system's outer perimeter.

Trust boundary examples:

  • External HTTP request → API handler (outer perimeter)
  • API handler → database query (internal boundary)
  • Microservice A → microservice B (service-to-service boundary)
  • User-uploaded file → file processing pipeline (file boundary)
  • Message queue consumer → business logic processor (async boundary)

The "validate at the edge only" assumption is the root cause of second-order injection attacks — where sanitized-looking data stored in a database is retrieved and processed unsafely by a second code path that assumes database content is trusted. Trust boundaries are not only external: every transition between processing contexts where the receiving component assumes the data is safe constitutes a trust boundary.

The trust boundary concept maps directly to the Zero-Trust Architecture network segmentation principle applied to data flow — see Zero-Trust-Architecture.


Injection Taxonomy

Core injection types and their validation-level mitigations:

SQL Injection

User data is interpreted as SQL syntax. Primary defense: parameterized queries (prepared statements) — validation is defense-in-depth only. Validation layer: allowlist alphanumeric + safe punctuation for string fields; reject SQL keywords and operator sequences in fields where they have no legitimate purpose.

XSS (Cross-Site Scripting)

User data is rendered as HTML/JavaScript in a browser. Primary defense: output encoding (HTML entity encoding at render time). Validation layer: allowlist safe HTML tags if rich text is required; reject <script>, javascript:, and inline event handlers (onload=, onerror=). CSP is the browser-level mitigation layer that catches what slips through — see CORS-CSP.

Command Injection

User data is interpreted as OS shell commands. Primary defense: never construct shell commands from user input; use parameterized API calls (e.g., execFile with arguments array instead of exec with concatenated string). Validation layer: allowlist command parameters to safe character sets; avoid exec(), system(), Runtime.getRuntime().exec() with user-controlled strings.

Path Traversal

User data is used to construct file system paths. Primary defense: canonicalize the resolved path and assert it starts with the expected base directory. Validation layer: reject .., //, null bytes, and encoded variants (%2F, %00) in path segments before path construction.

SSRF (Server-Side Request Forgery)

User data is used to construct outbound HTTP request URLs. Primary defense: URL allowlist restricting permitted hosts and schemes. Validation layer: parse and validate URL components (scheme, host, port) against an allowlist before making any outbound request; reject private IP ranges (10.x, 172.16.x, 192.168.x, 169.254.x) and loopback addresses.


TypeScript Example

import { z } from 'zod';
 
// Allowlist schema: define the exact contract for user registration
const UserRegistrationSchema = z.object({
  username: z.string()
    .min(3).max(50)
    .regex(/^[a-zA-Z0-9_-]+$/, 'Username: alphanumeric, underscore, hyphen only'),
  email: z.string().email().max(255),
  age: z.number().int().min(13).max(120),
  // Optional rich-text field — restricted to safe characters
  bio: z.string().max(500).optional(),
});
 
type UserRegistration = z.infer<typeof UserRegistrationSchema>;
 
export function validateUserRegistration(input: unknown): UserRegistration {
  // parse() throws ZodError with field-level messages on invalid input
  // Never use input directly before calling parse()
  return UserRegistrationSchema.parse(input);
}

The unknown type for input forces explicit validation before any property access. Never type-cast user input — always parse through the schema first.


Java Example

import jakarta.validation.constraints.*;
import jakarta.validation.Valid;
import org.springframework.web.bind.annotation.*;
 
public class UserRegistrationRequest {
 
    @NotBlank
    @Size(min = 3, max = 50)
    @Pattern(regexp = "^[a-zA-Z0-9_-]+$", message = "alphanumeric, underscore, hyphen only")
    private String username;
 
    @NotBlank
    @Email
    @Size(max = 255)
    private String email;
 
    @NotNull
    @Min(13) @Max(120)
    private Integer age;
 
    // Getters/setters omitted for brevity
}
 
// Controller: @Valid triggers Bean Validation; MethodArgumentNotValidException on failure
@PostMapping("/users")
public ResponseEntity<Void> register(@Valid @RequestBody UserRegistrationRequest req) {
    userService.register(req);
    return ResponseEntity.status(201).build();
}

Jakarta Bean Validation 3.x is the standard constraint API in Spring Boot 3.x and Jakarta EE 10. The @Valid annotation triggers validation before the handler method body executes — invalid input never reaches business logic.


ConceptRelationship
CORS-CSPCSP is the browser-layer mitigation against XSS; input validation is the server-layer primary defense
API-Key-AuthenticationAPI key format (prefix + high-entropy string) must be validated at ingress using an allowlist pattern
Zero-Trust-ArchitectureTrust boundary principle is ZTA's "verify explicitly" applied to data flow across service boundaries
Session-ManagementSession tokens received in cookies must be validated at every trust boundary, not assumed safe inside the network