Prometheus Certified Associate (PCA)
PromQL
Recording Rules
In this lesson, we explore recording rules in Prometheus, which allow the system to periodically evaluate PromQL expressions and store the results in its time series database. By precomputing metric data, recording rules significantly accelerate dashboard performance—whether using Grafana or Prometheus’ built-in UI—by eliminating on-the-fly calculations.
Tip
Before you configure recording rules, ensure that your Prometheus configuration file includes the correct reference to your rules file using the rule_files
property.
Configuring Recording Rules
To enable recording rules, you must define your rules in a separate file (commonly named rules.yml
). Within your Prometheus configuration file, reference this file using the rule_files
key. Prometheus then evaluates the specified expressions at a defined interval (such as every 30 seconds, minute, or any period you choose) and stores the results.
Below is an example Prometheus configuration that includes a rules file:
global:
scrape_interval: 5s
evaluation_interval: 5s
rule_files:
- rules.yml
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
You can also use glob patterns to automatically include multiple rule files from a directory. For instance, setting the rule file path as /etc/prometheus/rules/*.yaml
instructs Prometheus to import all files ending with .yaml
in that directory. Note that changes to rule files require a restart of the Prometheus server to take effect.
Rule File Structure
A rule file consists of one or more rule groups defined under the groups
key. Each group can have an interval
property that overrides the global evaluation interval, and the rules inside are executed sequentially. This sequential execution ensures that if one rule depends on the output of a previous rule, its dependency will be resolved correctly. However, different groups execute in parallel.
Here is an example format for a rule file:
groups:
- name: <group name 1>
interval: <evaluation interval>
rules:
- record: <rule name 1>
expr: <promql expression 1>
labels:
<label name>: <label value>
- record: <rule name 2>
expr: <promql expression 2>
- name: <group name 2>
rules:
# Additional rules for group 2
For instance, consider recording two metrics: the percentage of free memory on a node and the percentage of free space on a filesystem. Instead of recalculating these expressions with every query, set up recording rules to compute these metrics periodically:
groups:
- name: example1
interval: 15s
rules:
- record: node_memory_memFree_percent
expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
- record: node_filesystem_free_percent
expr: 100 * node_filesystem_free_bytes / node_filesystem_size_bytes
After configuring your rules, check their status by navigating to the "Status" tab and selecting "Rules" in the Prometheus interface. Each rule displays its state (OK or error).
Querying Recording Rules
When querying, simply use the rule's name (e.g., node_memory_memFree_percent
). Prometheus returns the precomputed value rather than executing the original expression. This not only speeds up the dashboard response time but also simplifies query formulation.
You can also reference a recorded rule within another rule. For example, to compute the average free percentage across instances:
groups:
- name: example1
interval: 15s
rules:
- record: node_filesystem_free_percent
expr: 100 * node_filesystem_free_bytes{job="node"} / node_filesystem_size_bytes{job="node"}
- record: node_filesystem_free_percent_avg
expr: avg by(instance)(node_filesystem_free_percent)
In this example, the first rule calculates the free percentage for each filesystem, and the second rule computes the average value by instance using the initial result. Thanks to sequential execution within the group, the second rule can safely reference the result of the first.
Best Practices for Naming Recording Rules
Adopting a clear naming convention helps maintain consistency and clarity. A suggested naming format breaks the name into three parts separated by a semicolon:
- Level: Indicates the aggregation level based on labels (e.g., job, method, path).
- Metric Name: The name of the metric.
- Operations: Lists the functions or aggregators applied (e.g., rate, sum, avg).
The image below illustrates this naming convention:
For example, consider an HTTP errors counter with labels for method and path. To compute a 5-minute error rate, you could define the rule as follows:
- record: job_method_path:http_errors:rate5m
expr: sum without(instance)(rate(http_errors{job="api"}[5m]))
Should you decide to remove the path
label from the aggregation, update the rule accordingly:
- record: job_method:http_errors:rate5m
expr: sum without(instance, path)(rate(http_errors{job="api"}[5m]))
Best Practice
Group all recording rules for the same job together. For instance, if you have separate rules for jobs "node" and "docker", create distinct groups for each.
An example grouping by job:
groups:
- name: node # All rules for job="node"
interval: 15s
rules:
- record: node_memory_memFree_percent
expr: 100 - (100 * node_memory_MemFree_bytes{job="node"} / node_memory_MemTotal_bytes{job="node"})
- name: docker # All rules for job="docker"
interval: 15s
rules:
# Docker-specific rules go here
Example: Setting Up Recording Rules
Follow these steps to create and activate recording rules:
Navigate to your Prometheus configuration directory (typically
/etc/prometheus
), and create or edit therules.yaml
file.Insert the following example configuration:
groups: - name: example1 interval: 15s rules: - record: node_filesystem_free_percent expr: 100 * node_filesystem_free_bytes{job="node"} / node_filesystem_size_bytes{job="node"} - record: node_filesystem_free_percent_avg expr: avg by(instance)(node_filesystem_free_percent)
This configuration sets up a group called "example1" that runs every 15 seconds to:
- Calculate the free percentage for each filesystem.
- Compute the average free percentage by instance.
Update your Prometheus configuration file (
prometheus.yml
) to load the rules file. For example:# Global configuration global: scrape_interval: 15s # Scrape interval evaluation_interval: 15s # Rule evaluation interval # Alertmanager configuration alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rule files rule_files: - rules.yaml # Scrape configurations scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "node" static_configs: - targets: ["192.168.1.168:9100", "192.168.1.168:9200"] - job_name: "api" static_configs: - targets: ["192.168.1.168:8000"] - job_name: "ec2" ec2_sd_configs: - region: us-east-1 access_key: AKIAQS3OUAYMSDIJ77J
Save the changes and restart the Prometheus server:
cd /etc/prometheus/ sudo vi rules.yaml sudo vi prometheus.yml sudo systemctl restart prometheus
After the restart, visit the Prometheus "Status" > "Rules" page to verify that your new rules are active and display with a green OK status.
To confirm rule functionality, open the expression browser and run a query for one of your recording rules:
record: node_filesystem_free_percent expr: 100 * node_filesystem_free_bytes / node_filesystem_size_bytes{job="node"} --- record: node_filesystem_free_percent_avg expr: avg by(instance)(node_filesystem_free_percent)
A sample query such as
$ node_filesystem_free_percent
might return:node_filesystem_free_percent{device="/dev/sda3", fstype="ext4", instance="192.168.1.168:9100", job="node", mountpoint="/"} 16.005607045046514 node_filesystem_free_percent{device="/dev/sda3", fstype="ext4", instance="192.168.1.168:9100", job="node", mountpoint="/boot/efi"} 98.97682793770934 node_filesystem_free_percent{device="tmpfs", fstype="tmpfs", instance="192.168.1.168:9100", job="node", mountpoint="/run"} 99.79180265142136 node_filesystem_free_percent{device="tmpfs", fstype="tmpfs", instance="192.168.1.168:9100", job="node", mountpoint="/run/lock"} 99.921875 node_filesystem_free_percent{device="tmpfs", fstype="tmpfs", instance="192.168.1.168:9100", job="node", mountpoint="/run/snapd/ns"} 99.79180265142136 node_filesystem_free_percent{device="tmpfs", fstype="tmpfs", instance="192.168.1.168:9100", job="node", mountpoint="/run/user/1000"} 99.65394224493011
When multiple instances are present, separate data is returned for each instance.
This comprehensive guide on recording rules in Prometheus shows how you can precompute values and efficiently organize your rules to enhance dashboard performance and simplify your queries.
For more detailed information, refer to the Prometheus Documentation.
Watch Video
Watch video content
Practice Lab
Practice lab