KodeKloud Notes

In this lesson, you’ll learn how to set up a complete monitoring stack with Prometheus, Grafana, and Alertmanager. We’ll cover:

Overview of Prometheus
Prometheus architecture
Data visualization with Grafana
Alerting with Prometheus and Alertmanager

1. Overview of Prometheus

Prometheus is an open-source, pull-based monitoring and alerting system designed for time series data. It excels at:

Multi-dimensional data model: metrics are labeled with key/value pairs.
PROMQL: a flexible query language for slicing, aggregating, and analyzing time series.
Native TSDB: optimized on-disk format for fast storage and retrieval.
White-box monitoring: collects internal metrics (CPU, memory, request rates, etc.) from instrumented code.

Note

Prometheus focuses strictly on metrics collection and alerting. It does not handle logs, distributed tracing, or advanced anomaly detection. Consider integrating with Grafana Loki for logs and Jaeger for tracing.

2. Prometheus Architecture

Prometheus consists of several core components that work together to discover, scrape, store, and alert on metrics.

2.1 Service Discovery

Prometheus automatically discovers scrape targets in dynamic environments:

Platform	Discovery Mechanism	Example
Kubernetes	Kubernetes API	`kubernetes_sd_configs`
AWS EC2	EC2 API	`ec2_sd_configs`
Consul	HTTP API	`consul_sd_configs`
Static	Manually defined targets	`static_configs`

2.2 Instrumentation and Exporters

Instrumentation happens in two ways:

Client libraries for languages like Go, Python, Java, and Ruby.
Exporters for services you cannot modify.

Exporter	Purpose	GitHub/Docs
Node Exporter	Host-level metrics (CPU, disk)	https://github.com/prometheus/node_exporter
Blackbox	Endpoint probing (HTTP, TCP, ICMP)	https://github.com/prometheus/blackbox_exporter
MySQL Exporter	Database server metrics	https://github.com/prometheus/mysqld_exporter
JMX Exporter	JVM application metrics	https://github.com/prometheus/jmx_exporter

2.3 Scraping and Local Storage

Prometheus periodically scrapes metrics from each target defined in prometheus.yml. Scrapes are HTTP GET requests that return metrics in the Prometheus exposition format. The time series data is then committed to the local TSDB.

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node1:9100', 'node2:9100']

Warning

Ensure your scrape intervals are tuned to your environment. Too frequent scrapes can overload the TSDB, while too infrequent scrapes may miss critical events.

3. Data Visualization with Grafana

Grafana is the leading open-source platform for time series analytics. It provides:

First-class support for Prometheus as a data source via PROMQL queries.
Interactive dashboards: graphs, heatmaps, tables, and more.
Alerting capabilities directly on dashboards.
A rich marketplace of community and official plugins.

Steps to connect Prometheus to Grafana:

Log in to Grafana and go to Configuration → Data Sources.
Add a new Prometheus data source and set the URL (e.g., http://prometheus:9090).
Save & Test, then import or build dashboards.

The image is a diagram illustrating the integration of Prometheus and Grafana for monitoring and alerting, showing how metrics are pulled from applications, stored, and visualized, with alerts managed and sent to various notification services.

4. Alerting with Prometheus and Alertmanager

Prometheus defines alerting rules in rules.yml using PROMQL expressions. When a condition is met for a specified duration, an alert is fired and sent to Alertmanager.

# rules.yml
groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: node_cpu_seconds_total{mode!="idle"} > 0.85
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"

Alertmanager manages notifications by:

Grouping and deduplicating alerts
Applying silence, inhibition, and inhibition rules
Throttling notifications to avoid floods
Routing to receivers: email, Slack, PagerDuty, webhook, etc.

# alertmanager.yml
route:
  group_by: ['alertname']
  receiver: 'slack-notifications'

receivers:
  - name: 'slack-notifications'
    slack_configs:
      - channel: '#alerts'
        api_url: 'https://hooks.slack.com/services/XXX/YYY/ZZZ'

Hands-On Demo: Istio + Prometheus + Grafana

We’ll now deploy Prometheus and Grafana on an Istio-enabled Kubernetes cluster to visualize service mesh metrics. You’ll learn to:

Deploy Prometheus using the Prometheus Operator.
Configure Grafana with Istio dashboards.
Set up Alertmanager for service-level alerts.

Links and References

Watch Video

Watch video content