- What Hubble is and how it provides network-level observability for Cilium-enabled clusters.
- How to inspect and trace traffic flows inside the cluster with Hubble (CLI and UI).
- How to integrate Cilium with Prometheus to scrape metrics for visibility and alerting.
Before you begin, ensure you have cluster-admin access, the Cilium CLI (optional but recommended), kubectl configured for the target cluster, and Prometheus (or a Prometheus-compatible scraper) available to ingest metrics.
Why use Hubble + Prometheus?
- Hubble provides packet- and flow-level visibility into L7 and L3/L4 traffic that Cilium enforces.
- Prometheus collects long-term telemetry for dashboards, SLOs, and alerts.
- Together they let you both explore live network flows and monitor trends or anomalies over time.
| Component | Purpose | Example |
|---|---|---|
| Hubble (CLI & UI) | Real-time packet/flow inspection and tracing | hubble observe --since 1m |
| Hubble Relay & UI | Aggregates flow data and provides web UI | cilium hubble enable --relay --ui |
| Prometheus | Scrapes and stores metrics for querying & alerting | Custom scrape_configs to collect Cilium metrics |
| Grafana (optional) | Visualize Cilium/Prometheus metrics | Pre-built dashboards or custom panels |
Overview of the flow
- Enable Hubble with Cilium so flows are captured on each node.
- Use the Hubble CLI or UI to inspect flows and trace connections.
- Expose the Cilium metrics endpoint and add a Prometheus scrape job to collect metrics.
- Create alerts (for example, high packet drops, connection failures) and dashboards in Grafana.
Enable Hubble (quick options)
You can enable Hubble at install time or enable it on an existing Cilium deployment. Option A — Using the Cilium CLI (recommended when available):- Install Cilium with Hubble enabled:
- Or enable Hubble on an existing installation:
hubble.enabled, hubble.ui.enabled, and hubble.relay.enabled in your values.yaml or manifests. Consult the Cilium docs for version-specific flags.
Verify Hubble status:
If you enable Hubble relay and UI on production clusters, ensure proper authentication and network access controls are in place (especially for the Hubble UI and relay ports) — these endpoints expose sensitive network telemetry.
Using the Hubble CLI
Start a port-forward to access Hubble Relay (if you prefer not to open service externally):- Port-forward the relay (replace namespace or service name if different):
- Observe recent flows (human-readable):
- Observe flows in JSON (for automation or parsing):
- Trace a connection between two endpoints (by IP/service):
- Get Hubble status and connection health:
-o table or -o json to control output format. Combine --follow to stream live flows.
Using the Hubble UI
If you enabledhubble.ui, access it through the service or via the relay port-forward from above:
- Start a local port-forward to the UI:
Output:
Metrics: Exposing Cilium Prometheus metrics
Cilium exposes Prometheus metrics describing agent health, endpoint stats, BPF programs, packet drops, and more. To collect these metrics you must configure your Prometheus server to scrape the correct targets. Common places where metrics appear:cilium-agentpods (per-node metrics).- A dedicated
cilium-metricsservice (if deployed). - Relay or exporter endpoints if using a metrics exporter.
cilium-metrics with your service name; check your installation.)
Example Prometheus scrape configuration
Below is a templatescrape_config you can adapt. Update role, namespace, and service name patterns according to your cluster and Cilium installation.
kubernetes_sd_configs accordingly.
Useful Cilium metrics & alert ideas
| Metric name (example) | What it indicates | Alert idea |
|---|---|---|
cilium_endpoint_regenerations_total | Endpoint policy/program changes | Alert if regeneration spikes for many endpoints |
cilium_drop_count_total | Number of dropped packets | Alert if drops exceed threshold per minute |
cilium_policy_denied_count_total | Policy-denied connections | Alert on sustained policy denies |
hubble_grpc_connections (Hubble-specific) | Active Hubble connections | Alert if Hubble connection count drops to 0 (indicating connectivity loss) |
Inspecting flows for debugging
Workflow for a typical troubleshooting session:- Reproduce the problematic traffic (from a client pod to a server pod or service).
- Use
hubble observewith filters:- Filter by source/destination pod, namespace, port, L7 protocol.
- Use
--sinceand--lastto scope the timeframe.
- If you need a visual path, open the Hubble UI to follow flow graphs and traces.
- Cross-reference Hubble flows with Prometheus metrics (drops, policy denies) to determine if a policy, BPF program, or network issue is causing failures.
Troubleshooting tips
- If
hubble observeshows no flows:- Ensure Hubble is enabled on Cilium and the relay is running.
- Confirm that traffic actually traverses the datapath (e.g., check hostNetwork pods or egress rules).
- If Prometheus doesn’t scrape Cilium metrics:
- Verify service names and namespaces in
scrape_configs. - Confirm the metrics endpoint responds (use
curlvia port-forward). - Check RBAC if Kubernetes SD is failing to discover endpoints.
- Verify service names and namespaces in
- For high cardinality metrics, use relabeling to reduce label explosion before retention.
Links and references
- Cilium Documentation — Hubble
- Cilium GitHub / releases
- Prometheus Documentation — Configuration
- Kubernetes Documentation — Service discovery in Prometheus
This guide provides a practical starting point for using Hubble with Prometheus. For cluster-specific details (service names, namespaces, authentication), consult your Cilium installation manifest or Helm values and the official Cilium docs.