Skip to main content

Metrics

Kono uses OpenTelemetry for instrumentation. Metrics can be exported via two backends:

  • Prometheus — OTel Prometheus exporter exposes a /metrics endpoint for scraping
  • OTLP — pushes metrics to any OpenTelemetry-compatible backend (OTel Collector, Grafana, Datadog, etc.)
gateway:
observability:
metrics:
enabled: true
exporter: prometheus # or: otlp
otlp:
endpoint: otel-collector:4318
insecure: true
interval: 10s
FieldTypeDefaultDescription
metrics.enabledboolfalseEnable metrics instrumentation
metrics.exporterstringprometheus or otlp
metrics.otlp.endpointstringOTLP HTTP endpoint to push metrics to
metrics.otlp.insecureboolfalseDisable TLS for the OTLP connection
metrics.otlp.intervalduration60sHow often to push metrics to the OTLP endpoint
info

When using exporter: prometheus, the /metrics endpoint is served on the admin port (server.admin_port), not the data port. This means Prometheus can scrape Kono over plain HTTP without needing a client certificate, even when the data port enforces mTLS. The admin port binds to 127.0.0.1 by default — see the Server configuration for details on exposing it to an external scraper.

When using exporter: otlp, no HTTP endpoint is exposed — metrics are pushed on the configured interval.

Available Metrics


MetricTypeLabelsDescription
kono_requests_totalCounterroute, method, statusTotal incoming requests that reached a flow, labeled by final HTTP status
kono_requests_duration_secondsHistogramroute, methodEnd-to-end request latency from gateway entry to response write
kono_requests_in_flightGaugeCurrent number of requests being processed
kono_failed_requests_totalCounterreasonRequests rejected before reaching a flow (see reasons below)
kono_upstream_requests_totalCounterroute, upstreamTotal requests dispatched to each upstream
kono_upstream_errors_totalCounterroute, upstream, kindUpstream errors broken down by error kind
kono_upstream_latency_secondsHistogramroute, upstreamTime from upstream request dispatch to response received
kono_upstream_retries_totalCounterroute, upstreamNumber of retry attempts per upstream
kono_circuit_breaker_stateGaugeupstreamCircuit breaker state: 0=closed, 1=open, 2=half-open

Histogram Buckets


Kono uses fixed bucket boundaries tuned for typical gateway latencies:

MetricBoundaries (seconds)
kono_requests_duration_seconds0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5
kono_upstream_latency_seconds0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5

Upstream Error Kinds


The kind label on kono_upstream_errors_total reflects the internal error classification:

KindDescription
timeoutUpstream did not respond within the configured timeout
connectionFailed to establish a connection to the upstream (includes TLS handshake failures)
bad_statusUpstream returned HTTP 5xx
read_errorConnection was closed while reading the response body
body_too_largeResponse body exceeded max_response_body_size
canceledRequest was canceled by the client before a response was received
circuit_openRequest was rejected by an open circuit breaker — upstream was not contacted
policy_violationResponse violated upstream policy (allowed_statuses, require_body)

Failure Reasons


kono_failed_requests_total tracks requests that never reach a flow:

ReasonDescription
too_many_requestsRate limiter rejected the request
no_matched_flowNo flow matched the request path or method
body_too_largeRequest body exceeded the gateway-wide limit (5 MB)

Grafana


When using exporter: otlp, the recommended setup is:

kono → [OTLP HTTP] → OTel Collector → [remote_write] → Prometheus ← Grafana

The OTel Collector receives metrics from kono, transforms them, and pushes to Prometheus via remote_write. Prometheus must be started with --web.enable-remote-write-receiver.

When using exporter: prometheus, Prometheus scrapes kono directly — no Collector needed. Point the scrape target at the admin port (server.admin_port).


PanelQuery
RPSrate(kono_requests_total[1m])
p99 latencyhistogram_quantile(0.99, rate(kono_requests_duration_seconds_bucket[5m]))
Error raterate(kono_requests_total{status=~"5.."}[1m]) / rate(kono_requests_total[1m])
Upstream error raterate(kono_upstream_errors_total[1m])
Circuit breaker openkono_circuit_breaker_state == 1
Retry pressurerate(kono_upstream_retries_total[5m])
In-flight requestskono_requests_in_flight
Upstream p95 latencyhistogram_quantile(0.95, rate(kono_upstream_latency_seconds_bucket[5m]))
info

Counter metrics like kono_requests_total are monotonically increasing — they never decrease. Always use rate() or increase() in Grafana queries rather than the raw counter value.