Skip to main content

Tracing

Aastro uses OpenTelemetry for distributed tracing. Spans are exported via OTLP/HTTP to any OpenTelemetry-compatible backend — OTel Collector, Jaeger, Tempo, Datadog, Honeycomb.

W3C traceparent and tracestate headers are propagated automatically: incoming traces are continued, outgoing requests to upstreams carry the trace context. The propagator is installed regardless of whether tracing is enabled, so aastro stays transparent for distributed trace context even with tracing turned off.

gateway:
observability:
tracing:
enabled: true
exporter: otlp
sampling_ratio: 1.0
otlp:
endpoint: otel-collector:4318
insecure: true
interval: 5s
FieldTypeDefaultDescription
tracing.enabledboolfalseEnable tracing instrumentation
tracing.exporterstringCurrently only otlp is supported
tracing.sampling_ratiofloat1.0Fraction of new root traces to sample. 1.0 = all, 0.0 = none
tracing.otlp.endpointstringOTLP HTTP endpoint to push spans to
tracing.otlp.insecureboolfalseDisable TLS for the OTLP connection
tracing.otlp.intervalduration5sBatch timeout — maximum time before a non-full batch is flushed
info

tracing.otlp.interval is the batch timeout, not a push interval. Spans are also flushed automatically when the batch reaches its size limit (512 spans). For low-traffic services, smaller values reduce visibility lag in the backend; for high-traffic services, the size limit dominates and the timeout rarely fires.

Span Hierarchy


A typical request to a fan-out flow produces this tree:

aastro.request [SpanKindServer]
├── aastro.plugin request-phase plugin
├── aastro.plugin ...
├── aastro.scatter
│ ├── aastro.upstream [SpanKindClient]
│ ├── aastro.upstream [SpanKindClient]
│ └── aastro.upstream [SpanKindClient]
└── aastro.plugin response-phase plugin

Passthrough flows skip aastro.scatter and have a single aastro.upstream span:

aastro.request [SpanKindServer]
└── aastro.upstream [SpanKindClient, mode=passthrough]
SpanWhen openedWhen closedParent
aastro.requestRequest enters Router.ServeHTTPResponse written or rate-limit rejectionRemote (from traceparent) or none
aastro.pluginBefore each plugin's ExecuteAfter plugin returnsaastro.request
aastro.scatterBeginning of scatter fan-outAll upstream goroutines completedaastro.request
aastro.upstreamBeginning of upstream call (per upstream)Upstream call returns, including all retriesaastro.scatter (or aastro.request for passthrough)

Span Attributes


aastro.request

AttributeDescription
http.methodRequest method
http.routeMatched flow path with parameter placeholders, e.g. /users/{id}
url.pathRaw request path
http.status_codeFinal response status
aastro.request.idULID identifying the request
aastro.request.fingerprint16-char hex hash of method, route template, header names, and query parameter names

aastro.upstream

AttributeDescription
http.methodMethod used for the upstream call
http.urlFull upstream URL with parameters expanded
http.status_codeHTTP status returned by the upstream
server.addressUpstream host:port
aastro.upstream.nameConfigured upstream name
aastro.upstream.hostHost selected by the load balancer
aastro.upstream.wait_usMicroseconds spent waiting for the parallelism semaphore
aastro.upstream.error_kindError classification on failure (see Metrics)
aastro.upstream.modepassthrough for passthrough flows; absent otherwise
aastro.flow.pathFlow path the upstream was called from

aastro.scatter

AttributeDescription
aastro.upstream.countNumber of upstreams in the scatter
aastro.aggregation.strategymerge, array, or namespace

aastro.plugin

AttributeDescription
aastro.plugin.nameConfigured plugin name
aastro.plugin.typerequest or response

Span Events

EventWhen recorded
semaphore.acquiredRecorded on aastro.upstream when the semaphore wait exceeded the configured threshold. Useful to visually distinguish wait time from actual upstream work in waterfall views

Resource Attributes


Every exported span carries resource attributes describing the aastro process. The same resource is attached to metrics, so traces and metrics from one process are correlated by service.name and service.instance.id in the backend.

AttributeSource
service.namegateway.service.name (default: aastro)
service.versionBuild-time -ldflags "-X main.version=…" injection
host.name, process.pid, process.command_argsAuto-detected at startup
telemetry.sdk.*OTel SDK metadata

Additional attributes from the OTEL_RESOURCE_ATTRIBUTES environment variable are merged in.

Sampling


Sampling determines which traces are recorded. Aastro uses a ParentBased sampler that respects the incoming traceparent flag — if an upstream service has already decided to sample a trace, aastro honors that decision regardless of sampling_ratio. Only new root traces (requests without an incoming traceparent) are subject to ratio-based sampling.

sampling_ratioNew root traces
1.0All sampled
0.110% sampled, deterministically by trace ID
0.0None sampled, but incoming sampled traces still recorded

The decision for TraceIDRatioBased(ratio) is made by hashing the trace ID — the same trace ID always yields the same decision across services, ensuring trace consistency.

info

For development and staging, use sampling_ratio: 1.0 to capture all traces. For production at high RPS, lower values ( e.g. 0.05) keep ingestion costs manageable while still providing statistical visibility.

Propagation


Aastro propagates W3C trace context bidirectionally:

  • Inboundtraceparent and tracestate headers from incoming requests are extracted into the request context. The resulting aastro.request span becomes a child of the upstream's span.
  • Outbound — when calling an upstream, aastro injects the current trace context into the outgoing request's traceparent header. If the upstream is OTel-instrumented, its handler will see aastro's span as the parent.

baggage headers are also propagated, allowing cross-service key-value context (e.g. tenant_id) to flow through the gateway.

The propagator is installed unconditionally — even with tracing.enabled: false, aastro still extracts and re-injects traceparent. This makes the gateway transparent to distributed tracing even when its own spans are not recorded.

Disabled Mode


When tracing.enabled: false:

  • No spans are exported.
  • No OTLP connection is opened.
  • The W3C propagator is still installed — incoming traceparent headers are forwarded to upstreams unchanged.
  • The internal otel.Tracer returns a no-op tracer; instrumented code paths run with minimal overhead.

Setup with OpenTelemetry Collector


A typical setup with the OTel Collector and Jaeger:

# docker-compose.yaml
services:
aastro:
image: aastro:latest
volumes:
- ./aastro.yaml:/etc/aastro/config.yaml
ports: [ "7805:7805" ]
depends_on: [ otel-collector ]

otel-collector:
image: otel/opentelemetry-collector-contrib:0.115.0
command: [ "--config=/etc/otel-collector.yaml" ]
volumes:
- ./otel-collector.yaml:/etc/otel-collector.yaml
depends_on: [ jaeger ]

jaeger:
image: jaegertracing/all-in-one:1.62
environment:
COLLECTOR_OTLP_ENABLED: "true"
ports: [ "16686:16686" ]
# otel-collector.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318

processors:
batch:
timeout: 1s

exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true

service:
pipelines:
traces:
receivers: [ otlp ]
processors: [ batch ]
exporters: [ otlp/jaeger ]
# aastro.yaml
gateway:
service:
name: aastro
observability:
tracing:
enabled: true
exporter: otlp
sampling_ratio: 1.0
otlp:
endpoint: otel-collector:4318
insecure: true
interval: 1s

After a request, find the trace in Jaeger UI at http://localhost:16686 — service aastro, operation aastro.request.

Reading Waterfalls


A few patterns to recognize when looking at a aastro trace:

Long aastro.upstream.wait_us. The upstream span starts at the same time as its siblings, but most of its duration is the semaphore wait. Look for the semaphore.acquired event to see where actual work begins. To increase parallelism, raise max_parallel_upstreams on the flow.

aastro.upstream span with error status and aastro.upstream.error_kind=connection. The upstream was unreachable. Check the aastro_circuit_breaker_state metric — if it is 1 (open), the breaker rejected subsequent requests without contacting the upstream.

Trace stops at aastro.request with no upstream spans. The request was rejected before reaching the scatter — usually due to a payload-too-large error, or a plugin failure. Look at http.status_code on aastro.request.

Single trace spanning multiple services. When upstreams are also OTel-instrumented, their spans appear as children of aastro.upstream, giving end-to-end visibility from client to backend.