Operational API Patterns

Operational API Patterns

Cross-protocol operational concerns — pagination, idempotency, rate limiting, and partial responses — appear in REST, GraphQL, and gRPC-transcoded APIs alike. These four patterns define the contract between a server and its clients for safe navigation, safe retries, quota enforcement, and bandwidth optimisation.

Intent

REST, GraphQL, and gRPC-transcoded APIs share a set of cross-cutting operational concerns that sit above the protocol layer: how to traverse large result sets without re-scanning data (pagination), how to make non-idempotent operations safe to retry (idempotency keys), how to signal quota exhaustion and guide client back-off (rate limiting), and how to reduce unnecessary payload transfer (partial responses). These six cross-protocol concerns are tightly related — idempotency keys interact with retry logic, rate limiting shapes retry cadence, and partial responses reduce bandwidth so clients stay within rate limits.

This note covers all four patterns in per-pattern H2 sections. Each section opens with "When NOT to Use" — the vault's suitability-first convention — before explaining how the pattern works and providing TypeScript (Express 5.x) and Java (Spring Boot 3.x) skeletal examples.

When NOT to Use (note-level)

Operational contracts add overhead without benefit in these contexts:

  • Simple internal APIs with known clients and small datasets — pagination over 50-100 records adds cursor logic, store lookups, and pageInfo encoding for no user-visible benefit. Full-list response is simpler and acceptable.
  • Read-only endpoints where idempotency is inherent — GET, HEAD, and OPTIONS are idempotent by HTTP spec (RFC 9110). Adding an idempotency key header on GET requests is meaningless overhead.
  • Low-traffic internal services where rate limiting adds complexity without protecting shared resources — rate limiting is a quota protection mechanism. If the service has a single trusted consumer and no shared quota to protect, the Retry-After header machinery adds latency with no protective benefit.

Pagination (OPER-01)

Pagination controls how a server breaks large result sets into sequential pages and how clients navigate those pages. Two primary models exist: offset pagination (simple, page-jump capable, performance-limited) and cursor pagination (stable, O(log n), sequential-only).

When NOT to Use

  • Small result sets (< 100 records) where a full-list response is acceptable — pagination adds cursor encoding, pageInfo construction, and limit-plus-one query tricks for no user-visible gain when the entire dataset fits in a single reasonably-sized payload.
  • Genuinely random-access UIs where offset page jumping is required and the dataset is small — if a UI needs "go to page 7 of 12" and the total record count stays under 10,000, offset pagination's page-jump capability is worth its tradeoffs. Above 10,000 records, switch to cursor pagination regardless.

Offset Pagination

Mechanism: Client sends ?page=3&limit=20 or ?offset=40&limit=20. Server translates to SQL: LIMIT 20 OFFSET 40.

Pros: Simple to implement. Supports page jumping — clients can request any page by number. Easy to calculate total page count when combined with a COUNT(*) query.

Cons:

  • Data stability: Breaks on live data. If a new row is inserted before page 2 is fetched, all rows shift by one position and the client either sees a duplicate or misses a row.
  • Performance cliff: OFFSET 100000 instructs the database to scan and discard 100,000 rows before returning the next 20. Performance degrades O(n) with offset value. Default to cursor pagination for any endpoint expected to serve > 10,000 records. The offset pagination performance cliff is the most common cause of list-endpoint timeouts in production.

Cursor Pagination

Mechanism: Client sends ?cursor=<opaque>&limit=20. Server decodes the cursor to recover the sort position and issues a keyset query:

WHERE (created_at, id) > ($last_created_at, $last_id)
ORDER BY created_at, id
LIMIT $3

Pros:

  • Stable under mutations — cursor points to a specific sort position, not a row count. Inserts and deletes before the cursor position do not affect subsequent pages.
  • O(log n) index seek — the database uses a composite index on (created_at, id) to seek directly to the cursor position. No rows are scanned and discarded.

Cons:

  • No page jumping — a cursor encodes a position; there is no way to jump to page 7 without traversing pages 1-6.
  • Sequential traversal only — cursors are directional. Forward pagination (after) is universal; backward pagination (before) requires the sort order to be reversible.

Opaque cursor encoding: Clients must treat cursors as opaque strings — they must not parse, increment, or construct cursor values manually. Encode the sort key and unique ID as Base64 of JSON so clients cannot see a bare integer or timestamp and be tempted to construct values:

Buffer.from(JSON.stringify({created_at: "2024-01-01T12:00:00Z", id: 123})).toString('base64')
→ eyJjcmVhdGVkX2F0IjoiMjAyNC0wMS0wMVQxMjowMDowMFoiLCJpZCI6MTIzfQ==

Keyset Pagination

Keyset pagination is a precise variant of cursor pagination that uses the actual column values (sort key + unique ID) directly as the cursor, without a wrapper encoding. The keyset query is identical. From the client's perspective, both cursor and keyset pagination expose an opaque cursor string — the distinction is implementation-level, not API-contract-level. The vault uses "cursor pagination" as the general term covering both.

Relay Connection Spec

The GraphQL Relay Connection Specification (relay.dev/graphql/connections.htm) formalises cursor pagination as a typed GraphQL schema. It defines:

  • Connection type — has edges (list of Edge) and pageInfo (non-null PageInfo) fields
  • Edge type — has node (the result item) and cursor (opaque string) fields
  • PageInfo type — has hasPreviousPage, hasNextPage, startCursor, endCursor
  • Pagination argumentsfirst/after for forward pagination, last/before for backward pagination

The Relay spec is GraphQL-specific, but the cursor-as-opaque-string principle applies equally to REST pagination. REST endpoints should follow the same contract: return an opaque nextCursor string, a hasNextPage boolean, and accept ?cursor=<opaque> as the pagination argument.

See GraphQL-API-Design for Relay Connection implementation in GraphQL resolvers (Phase 24 output).

Sequence Diagram

Cursor pagination flow with opaque next-page token propagation -- keyset query avoids OFFSET performance cliff.

sequenceDiagram
    participant C as Client
    participant API as API Server
    participant DB as Database

    Note over C,DB: Page 1 -- no cursor (initial request)

    C->>API: GET /orders?limit=20
    API->>DB: SELECT * FROM orders<br/>ORDER BY created_at, id<br/>LIMIT 21
    Note right of DB: limit + 1 trick:<br/>fetch 21 to detect hasNextPage
    DB-->>API: 21 rows returned
    API-->>C: 200 OK<br/>{ items: [...20],<br/>  nextCursor: "eyJjcmVhdGVkX2F0Ijo...",<br/>  hasNextPage: true }

    Note over C: Cursor is opaque --<br/>client must not parse or construct

    Note over C,DB: Page 2 -- cursor from previous response

    C->>API: GET /orders?cursor=eyJjcmVhdGVkX2F0Ijo...&limit=20
    API->>API: Decode cursor to<br/>{created_at, id}
    API->>DB: SELECT * FROM orders<br/>WHERE (created_at, id) > ($1, $2)<br/>ORDER BY created_at, id<br/>LIMIT 21
    Note right of DB: Keyset seek: O(log n)<br/>via composite index --<br/>no rows scanned and discarded
    DB-->>API: 15 rows returned
    API-->>C: 200 OK<br/>{ items: [...15],<br/>  nextCursor: null,<br/>  hasNextPage: false }

    Note over C: hasNextPage: false --<br/>pagination complete

TypeScript Example — Cursor Pagination (Express)

// Cursor pagination — opaque Base64 cursor encoding
// Concept from relay.dev/graphql/connections.htm — applies equally to REST
 
interface CursorPayload { created_at: string; id: number; }
 
function encodeCursor(payload: CursorPayload): string {
  return Buffer.from(JSON.stringify(payload)).toString('base64');
}
function decodeCursor(cursor: string): CursorPayload {
  return JSON.parse(Buffer.from(cursor, 'base64').toString('utf-8'));
}
 
// GET /orders?cursor=<opaque>&limit=20
app.get('/orders', async (req: Request, res: Response) => {
  const limit = Math.min(Number(req.query.limit) || 20, 100);
  const cursor = req.query.cursor as string | undefined;
  const decoded = cursor ? decodeCursor(cursor) : null;
 
  const rows = await db.query(
    `SELECT id, created_at, ... FROM orders
     WHERE ($1::timestamptz IS NULL OR (created_at, id) > ($1, $2))
     ORDER BY created_at, id LIMIT $3`,
    [decoded?.created_at ?? null, decoded?.id ?? null, limit + 1]
  );
 
  const hasNextPage = rows.length > limit;
  const items = rows.slice(0, limit);
  const nextCursor = hasNextPage
    ? encodeCursor({ created_at: items.at(-1)!.created_at, id: items.at(-1)!.id })
    : null;
 
  res.json({ items, nextCursor, hasNextPage });
});

The limit + 1 trick: query for one extra row to determine hasNextPage without a separate COUNT(*). Discard the extra row before returning items.

Java Example — Cursor Pagination (Spring Boot)

// Cursor pagination — Java (Spring Boot)
// Opaque cursor using Base64 encoding of sort key + ID
 
record CursorPayload(Instant createdAt, Long id) {}
 
String encodeCursor(CursorPayload p) {
    String json = """{"created_at":"%s","id":%d}""".formatted(p.createdAt(), p.id());
    return Base64.getUrlEncoder().encodeToString(json.getBytes(StandardCharsets.UTF_8));
}
 
// Repository query using keyset condition
// WHERE (created_at, id) > (:createdAt, :id) ORDER BY created_at, id LIMIT :limit
// Spring Data JPA equivalent: use @Query with named parameters

Idempotency Keys (OPER-02)

An idempotency key is a client-generated UUID sent in the Idempotency-Key request header on non-idempotent HTTP requests (POST, PATCH). The server stores the key and its associated response. On retry with the same key, the server returns the stored response without re-executing the operation.

This pattern makes unsafe retries safe — network failures, client timeouts, and mobile-app restarts can all cause a POST to be sent multiple times. Without idempotency keys, a payment is charged twice, an order is created twice, or an email is sent twice.

When NOT to Use

  • GET, PUT, DELETE requests — already idempotent by HTTP spec (RFC 9110). GET retrieves without side effects. PUT replaces with the same value. DELETE removes once; subsequent calls return 404 but have no additional effect. Adding Idempotency-Key to these methods is meaningless.
  • Internal services where a message broker provides exactly-once delivery guarantees — if the calling system is a Kafka producer with EOS (Exactly-Once Semantics) enabled, the broker layer prevents duplicate delivery. Application-level idempotency keys add a second deduplication layer that may be redundant (though defence-in-depth is not harmful).
  • Endpoints where duplicate execution is harmless — for example, a counter-increment endpoint where over-counting is acceptable, or a read-heavy endpoint with no write side effects.

How It Works

  1. Client generates key: crypto.randomUUID() in Node.js; UUID.randomUUID() in Java. UUID v4 satisfies the IETF draft requirement for a structured string with sufficient entropy.
  2. Client sends key: POST /payments with header Idempotency-Key: <uuid>.
  3. Server checks store: Look up the key in a persistent store (Redis with 24h TTL, or a DB table with unique constraint on (idempotency_key, user_id)).
  4. If key found — response replay:
    • Request still processing → return 409 Conflict with message "Request in progress"
    • Request completed → return stored response verbatim (same status code, same body)
    • Request failed → return original error response
  5. If key not found — process and store: Execute the operation, store {key → {status, body}}, return response.

Header name: Idempotency-Key — the standardised name per IETF draft-ietf-httpapi-idempotency-key-header-07 (October 2025). Stripe popularised the pattern with this exact header name. X-Idempotency-Key is non-standard — do not use it.

TTL: 24 hours is the Stripe convention. It balances storage cost against the window during which a client might retry (mobile clients may be offline for hours).

Idempotency Key (HTTP) vs Idempotent Consumer (EIP)

These are distinct patterns operating at different layers. The surface similarity — both use the word "idempotency" and both involve a "have I seen this before" store — causes frequent confusion.

DimensionIdempotency Key (HTTP)Idempotent Consumer (EIP)
LayerHTTP request/responseMessage queue consumer
Who generates the keyClient (UUID in request header)Message broker (messageId on envelope)
Where key is storedServer-side key-value storeConsumer-side persistent store (DB, Redis)
ProtocolHTTP POST/PATCHAMQP, Kafka, SQS messages
Pattern sourceStripe API design, IETF draftHohpe & Woolf EIP (2003)
Vault noteThis noteIdempotent-Consumer

HTTP idempotency keys are client-driven: the HTTP client generates the UUID, sends it in a header, and the HTTP server stores the response. Idempotent Consumer is broker-driven: the message broker assigns a messageId to each envelope, and the message consumer checks its own persistent store before processing.

TypeScript Example — Idempotency Key (Express)

// Idempotency Key pattern — Stripe-style UUID header
// Source: stripe.com/blog/idempotency + draft-ietf-httpapi-idempotency-key-header-07
 
// Client generates key: const key = crypto.randomUUID();
// Client sends: POST /payments with header Idempotency-Key: <uuid>
 
const idempotencyStore = new Map<string, { status: number; body: unknown }>();
// PRODUCTION NOTE: Replace Map with Redis (SET NX, TTL 24h) or DB unique constraint
// on (idempotency_key, user_id). In-memory Map is lost on restart — provides false safety.
 
app.post('/payments', async (req: Request, res: Response) => {
  const key = req.headers['idempotency-key'] as string | undefined;
 
  if (key) {
    const cached = idempotencyStore.get(key);
    if (cached) {
      // Response replay — return stored response verbatim
      return res.status(cached.status).json(cached.body);
    }
  }
 
  const result = await paymentService.charge(req.body);
  const responseBody = { paymentId: result.id, status: 'succeeded' };
 
  if (key) {
    idempotencyStore.set(key, { status: 201, body: responseBody });
    // PRODUCTION: set TTL of 24 hours on the Redis key
  }
 
  res.status(201).json(responseBody);
});

Java Example — Idempotency Key (Spring Boot)

// Idempotency Key pattern — Spring Boot
// Source: draft-ietf-httpapi-idempotency-key-header-07
 
// PRODUCTION NOTE: Replace ConcurrentHashMap with Redis (SET NX, TTL 24h) or
// DB unique constraint on (idempotency_key, user_id). In-memory store is lost on restart.
private final Map<String, ResponseEntity<?>> idempotencyStore = new ConcurrentHashMap<>();
 
@PostMapping("/payments")
ResponseEntity<PaymentResponse> createPayment(
        @RequestHeader(value = "Idempotency-Key", required = false) String idempotencyKey,
        @RequestBody PaymentRequest request) {
 
    if (idempotencyKey != null) {
        ResponseEntity<?> cached = idempotencyStore.get(idempotencyKey);
        if (cached != null) {
            // Response replay — return stored response verbatim
            return (ResponseEntity<PaymentResponse>) cached;
        }
    }
 
    PaymentResponse result = paymentService.charge(request);
    ResponseEntity<PaymentResponse> response = ResponseEntity.status(201).body(result);
 
    if (idempotencyKey != null) {
        idempotencyStore.put(idempotencyKey, response);
        // PRODUCTION: set TTL of 24 hours on the Redis key
    }
 
    return response;
}

Rate Limiting Response Contract (OPER-03)

Rate limiting response contract defines the server's obligation when a client exceeds its quota: return 429 Too Many Requests with headers that tell the client when its quota resets and how to retry safely. A 429 without Retry-After forces clients to guess retry intervals, which causes thundering-herd retries when thousands of clients retry simultaneously after the same guess.

When NOT to Use

  • Internal services behind a service mesh where rate limiting is handled at the infrastructure layer — Envoy, Istio, and AWS API Gateway all support rate limiting at the mesh/gateway layer. Duplicating rate limiting in application code adds latency with no additional protection when the infrastructure layer already enforces quotas.
  • Single-consumer APIs where the consumer is trusted and rate limiting adds latency without protecting shared resources — rate limiting is a shared-resource protection mechanism. If there is one trusted consumer and no shared quota to protect, the Retry-After header machinery is operational overhead without benefit.

Response Headers

Mandatory status code: 429 Too Many Requests (RFC 6585, 2012). Do not use 503 Service Unavailable for rate limiting — 503 signals server unavailability (the server cannot respond at all), not quota exhaustion.

HeaderStandardSemanticsExample Value
Retry-AfterRFC 9110Seconds to wait before retrying (delta-seconds preferred over HTTP-date to avoid clock skew)Retry-After: 60
X-RateLimit-LimitDe-factoTotal requests allowed in the current windowX-RateLimit-Limit: 1000
X-RateLimit-RemainingDe-factoRequests remaining in the current windowX-RateLimit-Remaining: 0
X-RateLimit-ResetDe-factoUnix timestamp when the window resetsX-RateLimit-Reset: 1714000000

Retry-After is mandatory on every 429 response — it is the minimum information a client needs to retry correctly.

On standardisation: draft-ietf-httpapi-ratelimit-headers-10 (Standards Track, September 2025) proposes two consolidated headers — RateLimit-Policy and RateLimit — to replace the three X-RateLimit-* headers. As of March 2026, this draft has not been published as an RFC and tooling support is limited. Use X-RateLimit-* for now; note that the IETF draft proposes RateLimit-Policy / RateLimit as the eventual standard. Verify current draft status at https://datatracker.ietf.org/doc/draft-ietf-httpapi-ratelimit-headers/.

Client obligation: Treat Retry-After as a floor, not a fixed interval. Multiple clients receiving the same Retry-After: 60 and retrying at exactly 60 seconds create a thundering-herd burst that immediately triggers another 429. Use exponential backoff with jitter: wait Retry-After + random(0, base_delay * 2^attempt) seconds.

Cross-link: For server-side failure protection that complements rate limiting, see Circuit-Breaker-Pattern.

Partial Responses

Partial responses allow clients to request only a subset of response fields, reducing payload size and server serialisation cost. The mechanism is a ?fields=id,name,email query parameter on any resource endpoint.

No RFC standard defines the parameter name. fields is the de-facto convention used by Google APIs, LinkedIn, and GitHub. JSON:API uses fields[type]= as a named variant (sparse fieldsets) — the principle is the same. Do not add a full JSON:API explanation to REST endpoints unless you are implementing the full JSON:API specification.

Relationship to rate limiting: Partial responses reduce bandwidth and server CPU. Clients that request only the fields they render consume less quota per request and are more likely to stay within their rate limit window. This is the grouping rationale for partial responses as a subsection of the rate limiting section.

When to use: Large response objects where clients consistently need only a few fields — for example, a list endpoint returning 50 items where each item has 30 fields but the mobile client only renders 3. Not worth the implementation overhead for small payloads (< 5 fields total).

TypeScript Example — Rate Limiting + Partial Response (Express)

// Rate limiting response contract — 429 + mandatory headers
// Source: RFC 6585 (429 status), RFC 9110 (Retry-After)
 
app.use('/api', rateLimitMiddleware, (req, res, next) => next());
 
function sendRateLimitExceeded(res: Response, resetTimestamp: number): void {
  const secondsUntilReset = Math.ceil((resetTimestamp - Date.now()) / 1000);
  res
    .status(429)
    .set('Retry-After', String(secondsUntilReset))          // RFC 9110 — mandatory
    .set('X-RateLimit-Limit', '1000')                       // quota per window
    .set('X-RateLimit-Remaining', '0')                      // none left
    .set('X-RateLimit-Reset', String(Math.floor(resetTimestamp / 1000))) // Unix epoch
    .json({
      type: 'https://api.example.com/errors/rate-limit-exceeded',
      title: 'Rate Limit Exceeded',
      status: 429,
      detail: `Retry after ${secondsUntilReset} seconds`,
    });
}
 
// Partial response — ?fields= query parameter convention
// Convention: Google APIs, LinkedIn, GitHub (no RFC standard)
 
function pickFields<T extends object>(obj: T, fields: string[]): Partial<T> {
  return fields.reduce((acc, key) => {
    if (key in obj) acc[key as keyof T] = obj[key as keyof T];
    return acc;
  }, {} as Partial<T>);
}
 
app.get('/users/:id', async (req: Request, res: Response) => {
  const user = await userService.findById(req.params.id);
  const fields = (req.query.fields as string)?.split(',') ?? null;
  const response = fields ? pickFields(user, fields) : user;
  res.json(response);
});
// GET /users/42?fields=id,name,email → { id: 42, name: "Alice", email: "alice@example.com" }

Java Example — Rate Limit Response (Spring Boot)

// Rate limit response — Java (Spring Boot)
// Source: RFC 6585 (429), RFC 9110 (Retry-After), Spring Framework 6 ProblemDetail
 
@ResponseStatus(HttpStatus.TOO_MANY_REQUESTS)
@ExceptionHandler(RateLimitExceededException.class)
ResponseEntity<ProblemDetail> handleRateLimit(RateLimitExceededException ex) {
    ProblemDetail problem = ProblemDetail
        .forStatusAndDetail(HttpStatus.TOO_MANY_REQUESTS, "Rate limit exceeded");
    problem.setType(URI.create("https://api.example.com/errors/rate-limit-exceeded"));
    return ResponseEntity.status(429)
        .header("Retry-After", String.valueOf(ex.getSecondsUntilReset()))
        .header("X-RateLimit-Limit", "1000")
        .header("X-RateLimit-Remaining", "0")
        .header("X-RateLimit-Reset", String.valueOf(ex.getResetTimestampEpoch()))
        .body(problem);
}

Lineage

Lineage Backward: Idempotent-Consumer — messaging-layer idempotency (consumer-side deduplication store keyed on broker messageId) is the conceptual predecessor to HTTP idempotency keys. Both solve the "process this exactly once" problem; they differ in layer (HTTP vs messaging), key source (client vs broker), and storage location (server vs consumer).

Lineage Forward: Compensating-Transactions — saga compensation steps require idempotent retries. When a saga orchestrator retries a compensation step after a failure, the compensating action must not execute twice. Idempotency keys are the HTTP-layer mechanism that makes compensation retries safe.

This note is the middle node of Lineage Chain 16 (formalised in Phase 27 API-Protocol-Selection-MOC): [[Idempotent-Consumer]] → [[Operational-API-Patterns]] (idempotency keys) → [[Compensating-Transactions]]


ConceptRelationship
REST-API-DesignREST is the primary protocol where these operational patterns apply; RMM, RFC 9457 error contracts, and versioning strategies in that note complement this one
Idempotent-ConsumerMessaging-layer counterpart to HTTP idempotency keys — distinct pattern, different layer, different key source
Circuit-Breaker-PatternServer-side failure protection that complements rate limiting; do not re-explain circuit breaker states here
Compensating-TransactionsSaga compensation requires idempotent retry capability — the forward-lineage target of this note
GraphQL-API-DesignRelay Connection Spec formalises cursor pagination for GraphQL; this note establishes the opaque cursor principle
  • Rate-Limiter-Design — Rate-Limiter-Design owns the system design algorithms (token bucket, sliding window, distributed state); Operational-API-Patterns owns the HTTP contract layer (429 status, Retry-After, X-RateLimit headers) — explicit scope boundary
  • Load-Balancer — L7 load balancers can enforce simple rate limits per source IP at the infrastructure layer; application-aware rate limiting (per-user, per-endpoint) belongs to Operational-API-Patterns, not LB infrastructure
  • Input-Validation — Operational-API-Patterns covers rate limiting and idempotency at the HTTP contract layer; Input-Validation covers the trust boundary validation (schema validation, parameterized queries, output encoding) that must happen before any business logic including rate limit checks
  • API-Key-Authentication — API key scoping and rate limit tiers are operationally linked; API-Key-Authentication covers the key lifecycle (dual-active rotation, revocation) while this note covers the rate limit headers and 429 response contract

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)