Section Introduction

In this lesson we focus on observability for Kubernetes clusters using Cilium. We’ll cover:

What Hubble is and how it provides network-level observability for Cilium-enabled clusters.
How to inspect and trace traffic flows inside the cluster with Hubble (CLI and UI).
How to integrate Cilium with Prometheus to scrape metrics for visibility and alerting.

You will learn how to trace individual connections, visualize service-to-service flows, and configure Prometheus to collect and store Cilium telemetry for dashboards and alerts.

Before you begin, ensure you have cluster-admin access, the Cilium CLI (optional but recommended), kubectl configured for the target cluster, and Prometheus (or a Prometheus-compatible scraper) available to ingest metrics.

Why use Hubble + Prometheus?

Hubble provides packet- and flow-level visibility into L7 and L3/L4 traffic that Cilium enforces.
Prometheus collects long-term telemetry for dashboards, SLOs, and alerts.
Together they let you both explore live network flows and monitor trends or anomalies over time.

Component	Purpose	Example
Hubble (CLI & UI)	Real-time packet/flow inspection and tracing	`hubble observe --since 1m`
Hubble Relay & UI	Aggregates flow data and provides web UI	`cilium hubble enable --relay --ui`
Prometheus	Scrapes and stores metrics for querying & alerting	Custom `scrape_configs` to collect Cilium metrics
Grafana (optional)	Visualize Cilium/Prometheus metrics	Pre-built dashboards or custom panels

Overview of the flow

Enable Hubble with Cilium so flows are captured on each node.
Use the Hubble CLI or UI to inspect flows and trace connections.
Expose the Cilium metrics endpoint and add a Prometheus scrape job to collect metrics.
Create alerts (for example, high packet drops, connection failures) and dashboards in Grafana.

Enable Hubble (quick options)

You can enable Hubble at install time or enable it on an existing Cilium deployment. Option A — Using the Cilium CLI (recommended when available):

Install Cilium with Hubble enabled:

cilium install \
  --set hubble.enabled=true \
  --set hubble.ui.enabled=true \
  --set hubble.relay.enabled=true

Or enable Hubble on an existing installation:

cilium hubble enable --relay --ui

Option B — Helm / YAML install If you install via Helm or manifests, enable the equivalent values for hubble.enabled, hubble.ui.enabled, and hubble.relay.enabled in your values.yaml or manifests. Consult the Cilium docs for version-specific flags. Verify Hubble status:

cilium hubble status

If you enable Hubble relay and UI on production clusters, ensure proper authentication and network access controls are in place (especially for the Hubble UI and relay ports) — these endpoints expose sensitive network telemetry.

Using the Hubble CLI

Start a port-forward to access Hubble Relay (if you prefer not to open service externally):

Port-forward the relay (replace namespace or service name if different):

kubectl port-forward -n kube-system svc/hubble-relay 4245:4245

Common Hubble CLI commands:

Observe recent flows (human-readable):

hubble observe --since 5m

Observe flows in JSON (for automation or parsing):

hubble observe --since 1m -o json

Trace a connection between two endpoints (by IP/service):

hubble observe --from-pod <pod-name> --to-pod <pod-name> --since 1m

Get Hubble status and connection health:

hubble status

Tip: Use -o table or -o json to control output format. Combine --follow to stream live flows.

Using the Hubble UI

If you enabled hubble.ui, access it through the service or via the relay port-forward from above:

Start a local port-forward to the UI:

Output:

bash
kubectl port-forward -n kube-system svc/hubble-ui 8081:80

Open your browser and visit:

http://localhost:8081

The UI displays flows, allows filtering by namespace, pod, port, L7 protocol, and shows trace paths across services.

Metrics: Exposing Cilium Prometheus metrics

Cilium exposes Prometheus metrics describing agent health, endpoint stats, BPF programs, packet drops, and more. To collect these metrics you must configure your Prometheus server to scrape the correct targets. Common places where metrics appear:

cilium-agent pods (per-node metrics).
A dedicated cilium-metrics service (if deployed).
Relay or exporter endpoints if using a metrics exporter.

Example: verify a metrics endpoint directly:

kubectl -n kube-system port-forward svc/cilium-metrics 9090:9090
curl http://localhost:9090/metrics | head

(Replace cilium-metrics with your service name; check your installation.)

Example Prometheus scrape configuration

Below is a template scrape_config you can adapt. Update role, namespace, and service name patterns according to your cluster and Cilium installation.

scrape_configs:
  - job_name: 'cilium'
    kubernetes_sd_configs:
      - role: endpoints
    relabel_configs:
      # Keep only endpoints in the namespace where Cilium exposes metrics
      - source_labels: [__meta_kubernetes_namespace]
        action: keep
        regex: kube-system|cilium
      # Keep only Cilium-related services (adjust regex to match your service names)
      - source_labels: [__meta_kubernetes_service_name]
        action: keep
        regex: cilium-metrics|cilium-agent
    metrics_path: /metrics
    scheme: http
    # Optional: add basic_auth or bearer token config if your metrics endpoint is secured

If your environment uses static service discovery or other SD mechanisms, replace kubernetes_sd_configs accordingly.

Useful Cilium metrics & alert ideas

Metric name (example)	What it indicates	Alert idea
`cilium_endpoint_regenerations_total`	Endpoint policy/program changes	Alert if regeneration spikes for many endpoints
`cilium_drop_count_total`	Number of dropped packets	Alert if drops exceed threshold per minute
`cilium_policy_denied_count_total`	Policy-denied connections	Alert on sustained policy denies
`hubble_grpc_connections` (Hubble-specific)	Active Hubble connections	Alert if Hubble connection count drops to 0 (indicating connectivity loss)

Example alert rule (conceptual):

- alert: CiliumHighPacketDrops
  expr: increase(cilium_drop_count_total[5m]) > 100
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "High packet drops detected on Cilium"
    description: "More than 100 drops in the last 5 minutes. Check Cilium endpoints and BPF programs."

Adjust thresholds to your baseline.

Inspecting flows for debugging

Workflow for a typical troubleshooting session:

Reproduce the problematic traffic (from a client pod to a server pod or service).
Use hubble observe with filters:
- Filter by source/destination pod, namespace, port, L7 protocol.
- Use --since and --last to scope the timeframe.
If you need a visual path, open the Hubble UI to follow flow graphs and traces.
Cross-reference Hubble flows with Prometheus metrics (drops, policy denies) to determine if a policy, BPF program, or network issue is causing failures.

Example Hubble observe with filters:

hubble observe --from-namespaces default --to-ports 8080 --since 2m

Troubleshooting tips

If hubble observe shows no flows:
- Ensure Hubble is enabled on Cilium and the relay is running.
- Confirm that traffic actually traverses the datapath (e.g., check hostNetwork pods or egress rules).
If Prometheus doesn’t scrape Cilium metrics:
- Verify service names and namespaces in scrape_configs.
- Confirm the metrics endpoint responds (use curl via port-forward).
- Check RBAC if Kubernetes SD is failing to discover endpoints.
For high cardinality metrics, use relabeling to reduce label explosion before retention.

Links and references

This guide provides a practical starting point for using Hubble with Prometheus. For cluster-specific details (service names, namespaces, authentication), consult your Cilium installation manifest or Helm values and the official Cilium docs.

​Why use Hubble + Prometheus?

​Overview of the flow

​Enable Hubble (quick options)

​Using the Hubble CLI

​Using the Hubble UI

​Metrics: Exposing Cilium Prometheus metrics

​Example Prometheus scrape configuration

​Useful Cilium metrics & alert ideas

​Inspecting flows for debugging

​Troubleshooting tips

​Links and references

Watch Video

Why use Hubble + Prometheus?

Overview of the flow

Enable Hubble (quick options)

Using the Hubble CLI

Using the Hubble UI

Metrics: Exposing Cilium Prometheus metrics

Example Prometheus scrape configuration

Useful Cilium metrics & alert ideas

Inspecting flows for debugging

Troubleshooting tips

Links and references