Skip to main content
In this lesson we focus on observability for Kubernetes clusters using Cilium. We’ll cover:
  • What Hubble is and how it provides network-level observability for Cilium-enabled clusters.
  • How to inspect and trace traffic flows inside the cluster with Hubble (CLI and UI).
  • How to integrate Cilium with Prometheus to scrape metrics for visibility and alerting.
You will learn how to trace individual connections, visualize service-to-service flows, and configure Prometheus to collect and store Cilium telemetry for dashboards and alerts.
Before you begin, ensure you have cluster-admin access, the Cilium CLI (optional but recommended), kubectl configured for the target cluster, and Prometheus (or a Prometheus-compatible scraper) available to ingest metrics.

Why use Hubble + Prometheus?

  • Hubble provides packet- and flow-level visibility into L7 and L3/L4 traffic that Cilium enforces.
  • Prometheus collects long-term telemetry for dashboards, SLOs, and alerts.
  • Together they let you both explore live network flows and monitor trends or anomalies over time.
ComponentPurposeExample
Hubble (CLI & UI)Real-time packet/flow inspection and tracinghubble observe --since 1m
Hubble Relay & UIAggregates flow data and provides web UIcilium hubble enable --relay --ui
PrometheusScrapes and stores metrics for querying & alertingCustom scrape_configs to collect Cilium metrics
Grafana (optional)Visualize Cilium/Prometheus metricsPre-built dashboards or custom panels

Overview of the flow

  1. Enable Hubble with Cilium so flows are captured on each node.
  2. Use the Hubble CLI or UI to inspect flows and trace connections.
  3. Expose the Cilium metrics endpoint and add a Prometheus scrape job to collect metrics.
  4. Create alerts (for example, high packet drops, connection failures) and dashboards in Grafana.

Enable Hubble (quick options)

You can enable Hubble at install time or enable it on an existing Cilium deployment. Option A — Using the Cilium CLI (recommended when available):
  • Install Cilium with Hubble enabled:
cilium install \
  --set hubble.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.relay.enabled=true
  • Or enable Hubble on an existing installation:
cilium hubble enable --relay --ui
Option B — Helm / YAML install If you install via Helm or manifests, enable the equivalent values for hubble.enabled, hubble.ui.enabled, and hubble.relay.enabled in your values.yaml or manifests. Consult the Cilium docs for version-specific flags. Verify Hubble status:
cilium hubble status
If you enable Hubble relay and UI on production clusters, ensure proper authentication and network access controls are in place (especially for the Hubble UI and relay ports) — these endpoints expose sensitive network telemetry.

Using the Hubble CLI

Start a port-forward to access Hubble Relay (if you prefer not to open service externally):
  • Port-forward the relay (replace namespace or service name if different):
kubectl port-forward -n kube-system svc/hubble-relay 4245:4245
Common Hubble CLI commands:
  • Observe recent flows (human-readable):
hubble observe --since 5m
  • Observe flows in JSON (for automation or parsing):
hubble observe --since 1m -o json
  • Trace a connection between two endpoints (by IP/service):
hubble observe --from-pod <pod-name> --to-pod <pod-name> --since 1m
  • Get Hubble status and connection health:
hubble status
Tip: Use -o table or -o json to control output format. Combine --follow to stream live flows.

Using the Hubble UI

If you enabled hubble.ui, access it through the service or via the relay port-forward from above:
  • Start a local port-forward to the UI:
Output:
bash
kubectl port-forward -n kube-system svc/hubble-ui 8081:80
Open your browser and visit: The UI displays flows, allows filtering by namespace, pod, port, L7 protocol, and shows trace paths across services.

Metrics: Exposing Cilium Prometheus metrics

Cilium exposes Prometheus metrics describing agent health, endpoint stats, BPF programs, packet drops, and more. To collect these metrics you must configure your Prometheus server to scrape the correct targets. Common places where metrics appear:
  • cilium-agent pods (per-node metrics).
  • A dedicated cilium-metrics service (if deployed).
  • Relay or exporter endpoints if using a metrics exporter.
Example: verify a metrics endpoint directly:
kubectl -n kube-system port-forward svc/cilium-metrics 9090:9090
curl http://localhost:9090/metrics | head
(Replace cilium-metrics with your service name; check your installation.)

Example Prometheus scrape configuration

Below is a template scrape_config you can adapt. Update role, namespace, and service name patterns according to your cluster and Cilium installation.
scrape_configs:
  - job_name: 'cilium'
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
      # Keep only endpoints in the namespace where Cilium exposes metrics
      - source_labels: [__meta_kubernetes_namespace]
        action: keep
        regex: kube-system|cilium
      # Keep only Cilium-related services (adjust regex to match your service names)
      - source_labels: [__meta_kubernetes_service_name]
        action: keep
        regex: cilium-metrics|cilium-agent
    metrics_path: /metrics
    scheme: http
    # Optional: add basic_auth or bearer token config if your metrics endpoint is secured
If your environment uses static service discovery or other SD mechanisms, replace kubernetes_sd_configs accordingly.

Useful Cilium metrics & alert ideas

Metric name (example)What it indicatesAlert idea
cilium_endpoint_regenerations_totalEndpoint policy/program changesAlert if regeneration spikes for many endpoints
cilium_drop_count_totalNumber of dropped packetsAlert if drops exceed threshold per minute
cilium_policy_denied_count_totalPolicy-denied connectionsAlert on sustained policy denies
hubble_grpc_connections (Hubble-specific)Active Hubble connectionsAlert if Hubble connection count drops to 0 (indicating connectivity loss)
Example alert rule (conceptual):
- alert: CiliumHighPacketDrops
  expr: increase(cilium_drop_count_total[5m]) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High packet drops detected on Cilium"
    description: "More than 100 drops in the last 5 minutes. Check Cilium endpoints and BPF programs."
Adjust thresholds to your baseline.

Inspecting flows for debugging

Workflow for a typical troubleshooting session:
  1. Reproduce the problematic traffic (from a client pod to a server pod or service).
  2. Use hubble observe with filters:
    • Filter by source/destination pod, namespace, port, L7 protocol.
    • Use --since and --last to scope the timeframe.
  3. If you need a visual path, open the Hubble UI to follow flow graphs and traces.
  4. Cross-reference Hubble flows with Prometheus metrics (drops, policy denies) to determine if a policy, BPF program, or network issue is causing failures.
Example Hubble observe with filters:
hubble observe --from-namespaces default --to-ports 8080 --since 2m

Troubleshooting tips

  • If hubble observe shows no flows:
    • Ensure Hubble is enabled on Cilium and the relay is running.
    • Confirm that traffic actually traverses the datapath (e.g., check hostNetwork pods or egress rules).
  • If Prometheus doesn’t scrape Cilium metrics:
    • Verify service names and namespaces in scrape_configs.
    • Confirm the metrics endpoint responds (use curl via port-forward).
    • Check RBAC if Kubernetes SD is failing to discover endpoints.
  • For high cardinality metrics, use relabeling to reduce label explosion before retention.

This guide provides a practical starting point for using Hubble with Prometheus. For cluster-specific details (service names, namespaces, authentication), consult your Cilium installation manifest or Helm values and the official Cilium docs.

Watch Video