Skip to main content

Tracing

Kono uses OpenTelemetry for distributed tracing. Spans are exported via OTLP/HTTP to any OpenTelemetry-compatible backend — OTel Collector, Jaeger, Tempo, Datadog, Honeycomb.

W3C traceparent and tracestate headers are propagated automatically: incoming traces are continued, outgoing requests to upstreams carry the trace context. The propagator is installed regardless of whether tracing is enabled, so kono stays transparent for distributed trace context even with tracing turned off.

gateway:
observability:
tracing:
enabled: true
exporter: otlp
sampling_ratio: 1.0
otlp:
endpoint: otel-collector:4318
insecure: true
interval: 5s
FieldTypeDefaultDescription
tracing.enabledboolfalseEnable tracing instrumentation
tracing.exporterstringCurrently only otlp is supported
tracing.sampling_ratiofloat1.0Fraction of new root traces to sample. 1.0 = all, 0.0 = none
tracing.otlp.endpointstringOTLP HTTP endpoint to push spans to
tracing.otlp.insecureboolfalseDisable TLS for the OTLP connection
tracing.otlp.intervalduration5sBatch timeout — maximum time before a non-full batch is flushed
info

tracing.otlp.interval is the batch timeout, not a push interval. Spans are also flushed automatically when the batch reaches its size limit (512 spans). For low-traffic services, smaller values reduce visibility lag in the backend; for high-traffic services, the size limit dominates and the timeout rarely fires.

Span Hierarchy


A typical request to a fan-out flow produces this tree:

kono.request                   [SpanKindServer]
├── kono.plugin request-phase plugin
├── kono.plugin ...
├── kono.scatter
│ ├── kono.upstream [SpanKindClient]
│ ├── kono.upstream [SpanKindClient]
│ └── kono.upstream [SpanKindClient]
└── kono.plugin response-phase plugin

Passthrough flows skip kono.scatter and have a single kono.upstream span:

kono.request                   [SpanKindServer]
└── kono.upstream [SpanKindClient, mode=passthrough]
SpanWhen openedWhen closedParent
kono.requestRequest enters Router.ServeHTTPResponse written or rate-limit rejectionRemote (from traceparent) or none
kono.pluginBefore each plugin's ExecuteAfter plugin returnskono.request
kono.scatterBeginning of scatter fan-outAll upstream goroutines completedkono.request
kono.upstreamBeginning of upstream call (per upstream)Upstream call returns, including all retrieskono.scatter (or kono.request for passthrough)

Span Attributes


kono.request

AttributeDescription
http.methodRequest method
http.routeMatched flow path with parameter placeholders, e.g. /users/{id}
url.pathRaw request path
http.status_codeFinal response status
kono.request.idULID identifying the request
kono.request.fingerprint16-char hex hash of method, route template, header names, and query parameter names

kono.upstream

AttributeDescription
http.methodMethod used for the upstream call
http.urlFull upstream URL with parameters expanded
http.status_codeHTTP status returned by the upstream
server.addressUpstream host:port
kono.upstream.nameConfigured upstream name
kono.upstream.hostHost selected by the load balancer
kono.upstream.wait_usMicroseconds spent waiting for the parallelism semaphore
kono.upstream.error_kindError classification on failure (see Metrics)
kono.upstream.modepassthrough for passthrough flows; absent otherwise
kono.flow.pathFlow path the upstream was called from

kono.scatter

AttributeDescription
kono.upstream.countNumber of upstreams in the scatter
kono.aggregation.strategymerge, array, or namespace

kono.plugin

AttributeDescription
kono.plugin.nameConfigured plugin name
kono.plugin.typerequest or response

Span Events

EventWhen recorded
semaphore.acquiredRecorded on kono.upstream when the semaphore wait exceeded the configured threshold. Useful to visually distinguish wait time from actual upstream work in waterfall views

Resource Attributes


Every exported span carries resource attributes describing the kono process. The same resource is attached to metrics, so traces and metrics from one process are correlated by service.name and service.instance.id in the backend.

AttributeSource
service.namegateway.service.name (default: kono)
service.versionBuild-time -ldflags "-X main.version=…" injection
host.name, process.pid, process.command_argsAuto-detected at startup
telemetry.sdk.*OTel SDK metadata

Additional attributes from the OTEL_RESOURCE_ATTRIBUTES environment variable are merged in.

Sampling


Sampling determines which traces are recorded. Kono uses a ParentBased sampler that respects the incoming traceparent flag — if an upstream service has already decided to sample a trace, kono honors that decision regardless of sampling_ratio. Only new root traces (requests without an incoming traceparent) are subject to ratio-based sampling.

sampling_ratioNew root traces
1.0All sampled
0.110% sampled, deterministically by trace ID
0.0None sampled, but incoming sampled traces still recorded

The decision for TraceIDRatioBased(ratio) is made by hashing the trace ID — the same trace ID always yields the same decision across services, ensuring trace consistency.

info

For development and staging, use sampling_ratio: 1.0 to capture all traces. For production at high RPS, lower values ( e.g. 0.05) keep ingestion costs manageable while still providing statistical visibility.

Propagation


Kono propagates W3C trace context bidirectionally:

  • Inboundtraceparent and tracestate headers from incoming requests are extracted into the request context. The resulting kono.request span becomes a child of the upstream's span.
  • Outbound — when calling an upstream, kono injects the current trace context into the outgoing request's traceparent header. If the upstream is OTel-instrumented, its handler will see kono's span as the parent.

baggage headers are also propagated, allowing cross-service key-value context (e.g. tenant_id) to flow through the gateway.

The propagator is installed unconditionally — even with tracing.enabled: false, kono still extracts and re-injects traceparent. This makes the gateway transparent to distributed tracing even when its own spans are not recorded.

Disabled Mode


When tracing.enabled: false:

  • No spans are exported.
  • No OTLP connection is opened.
  • The W3C propagator is still installed — incoming traceparent headers are forwarded to upstreams unchanged.
  • The internal otel.Tracer returns a no-op tracer; instrumented code paths run with minimal overhead.

Setup with OpenTelemetry Collector


A typical setup with the OTel Collector and Jaeger:

# docker-compose.yaml
services:
kono:
image: kono:latest
volumes:
- ./kono.yaml:/etc/kono/config.yaml
ports: [ "7805:7805" ]
depends_on: [ otel-collector ]

otel-collector:
image: otel/opentelemetry-collector-contrib:0.115.0
command: [ "--config=/etc/otel-collector.yaml" ]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml
depends_on: [ jaeger ]

jaeger:
image: jaegertracing/all-in-one:1.62
environment:
COLLECTOR_OTLP_ENABLED: "true"
ports: [ "16686:16686" ]
# otel-collector.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318

processors:
batch:
timeout: 1s

exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true

service:
pipelines:
traces:
receivers: [ otlp ]
processors: [ batch ]
exporters: [ otlp/jaeger ]
# kono.yaml
gateway:
service:
name: kono
observability:
tracing:
enabled: true
exporter: otlp
sampling_ratio: 1.0
otlp:
endpoint: otel-collector:4318
insecure: true
interval: 1s

After a request, find the trace in Jaeger UI at http://localhost:16686 — service kono, operation kono.request.

Reading Waterfalls


A few patterns to recognize when looking at a kono trace:

Long kono.upstream.wait_us. The upstream span starts at the same time as its siblings, but most of its duration is the semaphore wait. Look for the semaphore.acquired event to see where actual work begins. To increase parallelism, raise max_parallel_upstreams on the flow.

kono.upstream span with error status and kono.upstream.error_kind=connection. The upstream was unreachable. Check the kono_circuit_breaker_state metric — if it is 1 (open), the breaker rejected subsequent requests without contacting the upstream.

Trace stops at kono.request with no upstream spans. The request was rejected before reaching the scatter — usually due to a payload-too-large error, or a plugin failure. Look at http.status_code on kono.request.

Single trace spanning multiple services. When upstreams are also OTel-instrumented, their spans appear as children of kono.upstream, giving end-to-end visibility from client to backend.