Skip to main content
Welcome — in this lesson you’ll learn how to add distributed tracing to AI agents running with KAgent and inspect traces in Jaeger. As agent ecosystems grow, visibility into agent actions and their LLM interactions is essential for debugging, cost analysis, and observability. Objectives:
  • Introduce Jaeger and the minimal configuration used in this lab.
  • Install Jaeger (all-in-one, in-memory) and KAgent.
  • Configure KAgent to export OpenTelemetry (OTEL) traces to Jaeger.
  • Generate agent traffic and inspect the traces in the Jaeger UI.
This lab focuses on integrating KAgent with an OTEL backend; Jaeger is used as a simple example. KAgent supports exporting to any OTEL-compatible backend.
This lesson uses Jaeger in all-in-one (development) mode with in-memory storage. Traces are transient and will be lost if the Jaeger pod restarts.

1. Jaeger configuration (all-in-one, in-memory)

The Helm values below run Jaeger in all-in-one mode with in-memory storage — suitable for development and short-lived labs. Because storage is memory, traces are not persisted across restarts.
# jaeger.yaml
# Jaeger Helm Values Configuration
# provisionDataStore: Controls whether to provision a data store (Cassandra/Elasticsearch)
# Setting to false means we'll use the built-in storage option
provisionDataStore:
  cassandra: false

# allInOne: Enables the all-in-one deployment mode
# This runs collector, query, and agent in a single pod - ideal for development
allInOne:
  enabled: true

# storage: Configures the storage backend for traces
# Using memory storage for simplicity - traces will be lost on pod restart
storage:
  type: memory

# agent: Jaeger agent component (disabled in all-in-one mode)
agent:
  enabled: false

# collector: Jaeger collector component (disabled in all-in-one mode)
collector:
  enabled: false

# query: Jaeger query component (disabled in all-in-one mode)
query:
  enabled: false
Ports and OTEL endpoints to be aware of:
  • Jaeger UI (query): port 16686.
  • Jaeger collector OTLP: port 4317 (gRPC) and 4318 (HTTP/protobuf). In this lab we point KAgent to the OTLP gRPC endpoint (4317).
Install Jaeger using Helm (example chart version 3.4.1 used in this lab):
# Add the Jaeger Helm repo and install Jaeger with the above values
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm repo update
helm install jaeger jaegertracing/jaeger \
  --namespace jaeger --create-namespace \
  -f jaeger.yaml --version 3.4.1

2. KAgent values (enable OTEL tracing)

The trimmed KAgent values file below enables a minimal set of agents and configures OTEL tracing to send data to the Jaeger collector via OTLP gRPC. The critical section is otel.tracing.exporter.otlp.endpoint.
# kagent-values.yaml
agents:
  argo-rollouts-agent:
    enabled: false
  cilium-debug-agent:
    enabled: false
  cilium-manager-agent:
    enabled: false
  cilium-policy-agent:
    enabled: false
  helm-agent:
    enabled: false
  istio-agent:
    enabled: false
  k8s-agent:
    enabled: true
  kgateway-agent:
    enabled: false
  observability-agent:
    enabled: false
  promql-agent:
    enabled: false

kmcp:
  enabled: true

kagent-tools:
  enabled: true

tools:
  grafana-mcp:
    enabled: false
  querydoc:
    enabled: false

# OpenTelemetry configuration for distributed tracing
otel:
  tracing:
    enabled: true
    exporter:
      otlp:
        # OTLP gRPC endpoint pointing to Jaeger collector service (host:port, do not include http scheme for gRPC)
        endpoint: jaeger-collector.jaeger.svc.cluster.local:4317
Install KAgent with Helm:
helm install kagent oci://ghcr.io/kagent-dev/kagent/helm/kagent \
  --namespace kagent \
  -f /root/kagent-values.yaml
CRDs and ModelConfig are assumed to be pre-installed for this lab.

3. Patch the KAgent UI service for external (lab) access

In the lab environment we expose the KAgent UI externally by changing the service type to NodePort and mapping port 8080 to node port 30080. The JSON payload contains curly braces and is provided as a code block to avoid MDX parsing issues.
kubectl patch svc kagent-ui -n kagent -p '{
  "spec": {
    "type": "NodePort",
    "ports": [
      {
        "name": "ui",
        "port": 8080,
        "targetPort": 8080,
        "nodePort": 30080
      }
    ]
  }
}'
Verify pods are running in the kagent namespace:
kubectl get pod -n kagent
Example expected output:
NAME                                            READY   STATUS              RESTARTS   AGE
k8s-agent-756dcd9c66-dkqtj                      1/1     Running             0          53s
k8s-agent-7784d598cc-xtmqz                      1/1     Running             0          53s
kagent-controller-7864494766-9htkv             1/1     Running             0          53s
kagent-kmcp-controller-manager-76645f577f-9xhts 1/1    Running             0          53s
kagent-tools-56c49d7d4d-2vw4b                   1/1     Running             0          53s
kagent-ui-59d5bbd564-2bjn2                      1/1     Running             0          53s
If any pod is not ready, investigate with kubectl describe and kubectl logs.

4. Generate agent traffic and inspect traces in Jaeger

With KAgent exporting OTEL traces to Jaeger, exercise the agent (for example: list pods, query deployments, or perform model calls via the KAgent UI). These activities generate traces that appear in Jaeger. Steps to inspect traces:
  1. Open the KAgent UI and use the built-in link to the Jaeger UI (or open Jaeger at the cluster-exposed query endpoint).
  2. In Jaeger:
    • Select the kagent service (or KAgent depending on naming).
    • Filter by operation (e.g., openai.chat) and choose an appropriate time range.
    • Open individual traces to expand spans and view tags.
When viewing a trace you can expect to see:
  • Services involved and span hierarchy.
  • Start timestamps and durations per span.
  • Tags containing LLM prompt content and token usage (e.g., genai.usage.prompt_tokens).
  • OTEL instrumentation metadata (e.g., otel.library.name, otel.library.version).
Example system prompt captured in a trace:
You are KubeAssist, an advanced AI agent specialized in Kubernetes troubleshooting and operations. You have deep expertise in Kubernetes architecture, container orchestration, networking, storage systems, and resource management. Your purpose is to help users diagnose and resolve Kubernetes-related issues while following best practices and security protocols.

## Core Capabilities

- Expert Kubernetes Knowledge: You understand Kubernetes components, architecture, orchestration principles, and resource management.
- Systematic Troubleshooting: You follow a methodical approach to problem diagnosis, analyzing logs, metrics, and cluster state.
- Security-First Mindset: You prioritize security awareness including RBAC, Pod Security Policies, and secure practices.
- Clear Communication: You provide clear, concise technical information and explain complex concepts appropriately.
Example trace metadata snippet showing prompt content, timing, and OTEL library information:
kagent: POST /5aac84e
Trace Start December 19 2025, 23:41:33.516  Duration 7.2s  Services 1  Depth 8  Total Spans 95

gen_ai.prompt.11.tool_call_id    call_xBdHWLcmn90kzznydtv7kdnw
gen_ai.prompt.12.content         Get deployment status in kagent namespace
gen_ai.prompt.12.role            user
gen_ai.prompt.13.content         Here is the list of all pods across all namespaces with their status:
Namespace | Pod Name | Ready | Stat

> Process: otel.library.name = opentelemetry.instrumentation.openai.v1 | otel.library.version = 0.47.3
Note: token usage tags (genai.usage.prompt_tokens, genai.usage.completion_tokens, etc.) are useful for cost analysis and assessing how much context is being sent to the model.
This lab uses in-memory Jaeger storage. For production systems, do not use in-memory storage — switch to a persistent backend (e.g., Cassandra, Elasticsearch) or a managed OTEL backend to retain traces and ensure availability.

5. Inspect Jaeger services and endpoints

You can validate the services created by the Helm chart to confirm the collector and query endpoints:
ServiceType / PortsNotes
jaeger-agentClusterIP, Ports: 5775/UDP, 5778/TCP, 6831/UDP, 6832/UDPLocal agent ports for legacy Thrift/UDP
jaeger-collectorClusterIP, Ports: 9411/TCP, 14250/TCP, 14267/TCP, 14268/TCP, 4317/TCP, 4318/TCPOTLP gRPC 4317 and HTTP/protobuf 4318 exposed here
jaeger-queryNodePort, ClusterIP 172.20.178.179, Ports: 16686:31686/TCPJaeger UI (query) typically on 16686
If you need details on any specific service or endpoint:
kubectl describe svc <service> -n jaeger

Wrap-up and recommendations

  • This lesson demonstrated configuring Jaeger (all-in-one, in-memory) and KAgent to export OTEL traces to Jaeger.
  • Use the KAgent UI and Jaeger UI to inspect agent traces, including LLM prompts and token usage for debugging and cost analysis.
  • For production:
    • Replace in-memory Jaeger storage with a persistent backend (Cassandra, Elasticsearch) or a managed OTEL backend.
    • Deploy Jaeger (or OTEL collector) in a highly available configuration.
    • Secure OTLP endpoints with TLS and authentication where supported.
Next steps: run additional agents, increase load, or integrate with a persistent OTEL backend to observe trace retention and scale behavior. That’s it for this lesson — proceed to hands-on exercises to generate traces and explore span details in Jaeger.

Watch Video