Observability

Request IDs, structured logs, and distributed tracing - the three signals that turn a multi-hour incident into a five-minute one

When something goes wrong in production, the first question is always the same: which request, what did it do, where did it slow down or fail? Observability is how you answer that without guessing.

This page is the integrator's checklist. You don't need all of it on day one - the X-Request-Id baseline gets you 80% of the value - but every step compounds.

The three signals

Signal	Effort	Value
`X-Request-Id` per request	Trivial - log the response header	Identifies the exact request in our logs when you open a ticket
Structured error logs	Small - one log line per error response	Lets you correlate failures to your own business IDs
Distributed tracing (W3C TraceContext)	Moderate - a tracing SDK on your side	Single timeline across your service and ours; finds latency hot-spots and cross-service failures

If you do nothing else, do the first two. If you're running anything beyond a one-shot script, do the third.

`X-Request-Id`

Every request to the API carries an X-Request-Id. If you don't send one, we generate it (KSUID format) and return it in the response. Either way:

Log it on every request - success and failure - alongside your own correlation IDs (your order ID, user ID, etc.).
Echo it back to your callers if you sit between an end-user and our API - it shortens the chain when you escalate to us.
Include it in support tickets. This is the fastest way for us to find the exact request in our logs.

X-Request-Id: 2F7vTVG3Y8aZrqP1wJhdEkM9oYB

You can generate your own (any KSUID-compatible string under 64 chars works) and Octopus Cards preserves it. Useful when request IDs are already minted earlier in the call chain.

Structured error logs

Every non-2xx response uses the same envelope:

{
  "error": {
    "name": "BadRequestError",
    "code": "BAD_REQUEST",
    "message": "Duplicate client_reference"
  }
}

Log it with structure - not as a stringified blob:

{
  "ts": "2026-04-22T17:40:00Z",
  "level": "error",
  "request_id": "2F7vTVG3Y8aZrqP1wJhdEkM9oYB",
  "your_order_id": "A-42",
  "api_error_code": "BAD_REQUEST",
  "api_error_message": "Duplicate client_reference",
  "api_error_name": "BadRequestError",
  "status": 400
}

Why structured: when you scan logs for "all BAD_REQUEST with Duplicate client_reference in the last hour", you want a query, not a regex. Full guidance on the error shape is in Handling Errors.

Distributed tracing (W3C TraceContext)

We accept the W3C traceparent header on every API request. When your client sends one:

Your end-to-end trace shows the Octopus Cards call as a single span on your side, with accurate timing.
Internal spans on our side are tagged with your trace_id. Sharing that ID with support lets them pull every span that touched the request across the backend - auth, repo calls, fulfilment, the lot.
Stitching works with any OTel-compatible backend you run (SigNoz, Honeycomb, Datadog, Jaeger, Tempo, etc.). No special config on our side.

The traceparent header looks like this:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
              ^   ^                                ^                ^
              |   trace-id (32 hex)                span-id (16 hex) flags
              version

Almost every OTel SDK injects this automatically when you use its HTTP client wrapper. Examples in Go, Node.js, and Python:

Go (`otelhttp`)

import "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"

client := &http.Client{
    Transport: otelhttp.NewTransport(http.DefaultTransport),
    Timeout:   60 * time.Second,
}

// Pass a traced context - traceparent is auto-injected on the outgoing request
req, _ := http.NewRequestWithContext(ctx, "POST", apiURL, body)
resp, err := client.Do(req)

Node.js (auto-instrumentation)

import { HttpInstrumentation } from '@opentelemetry/instrumentation-http'
import { registerInstrumentations } from '@opentelemetry/instrumentation'

registerInstrumentations({
  instrumentations: [new HttpInstrumentation()],
})

// Plain fetch / axios / undici - traceparent is injected automatically once
// the SDK is initialised at process start.
await fetch('https://api.octopuscards.io/api/v1/orders', { /* ... */ })

Python (auto-instrumentation)

from opentelemetry.instrumentation.requests import RequestsInstrumentor
# For httpx, swap in opentelemetry.instrumentation.httpx.HTTPXClientInstrumentor.

RequestsInstrumentor().instrument()

# Plain requests calls - traceparent is injected automatically once
# the SDK is initialised at process start.
import requests
requests.post('https://api.octopuscards.io/api/v1/orders', json={ ... })

If you don't run an OTel SDK and just want minimal stitching, you can construct the traceparent yourself:

00-{your-32-char-hex-trace-id}-{your-16-char-hex-span-id}-01

Even a hand-rolled trace_id (e.g. derived from your own request ID) is enough for us to correlate logs across the boundary.

Putting all three together

A production-grade log line on the integrator side looks like:

{
  "ts": "2026-04-22T17:40:00Z",
  "level": "error",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "request_id": "2F7vTVG3Y8aZrqP1wJhdEkM9oYB",
  "your_order_id": "A-42",
  "api_error_code": "BAD_REQUEST",
  "api_error_message": "Duplicate client_reference",
  "status": 400,
  "duration_ms": 312
}

That single line, replicated across enough requests, is the difference between "the API is broken" (vague) and "5% of orders for client X failed with INSUFFICIENT_FUNDS between 14:02 and 14:07, all on wallet W-9, trace 4bf9..." (a ticket we can resolve in minutes).

When you open a support ticket

Send us any one of, in order of preference:

trace_id (W3C TraceContext) - we can pull every span across our backend that touched the request.
X-Request-Id - we can find the single request in our logs.
Order id or client_reference + approximate timestamp - workable but slower; we have to scan a window.

The richer the signal you give us, the faster the resolution. A trace_id usually shortcuts to a root cause within minutes; a vague "it didn't work yesterday" is a multi-hour archaeology session.

Sampling

Tracing is cheap until it isn't. For high-volume integrations:

Always sample errors. A 4xx or 5xx response is the one trace you actually need.
Tail-based sampling (sample after the trace completes, based on outcome) gives you 100% of failures and a small slice of successes. Most modern OTel collectors support this.
Head-based sampling (decide at trace start) is simpler but loses errors. Sample at 10–20% of successes if you go this route, 100% of errors via a separate path.

Don't sample so aggressively that you can't reproduce a failed scenario. The cost saving isn't worth the blind spot.

Summary

X-Request-Id on every log line. Mandatory.
Structured logs - parse error.code, log it as a field, not a string.
W3C traceparent if you run any tracing at all - we honour it and tag our spans with your trace_id, so support tickets become a trace_id lookup instead of a forensic exercise.
Open tickets with a trace_id or request_id. Don't make us guess.

On this page