Prometheus Certified Associate (PCA)
PromQL
Subquery
Assume we have a gauge metric and want to determine its maximum value over the past 10 minutes. To achieve this, we can use the max_over_time
function by providing the metric and specifying a 10-minute range.
Now consider a counter metric where the goal is to compute the maximum rate over the past 10 minutes. An initial approach might be to execute the rate
function over the metric and then apply max_over_time
. For example:
$ max_over_time(node_filesystem_avail_bytes[10m])
For a counter, you would first use the rate
function over a 10-minute period and then attempt to find the maximum value of that rate. However, this approach does not work directly because the rate
function returns an instant vector, while max_over_time
expects a range vector. For instance, consider the queries:
$ max_over_time(node_filesystem_avail_bytes[10m])
$ max_over_time(rate(http_requests_total[10m]))
Note
Remember: the [10m]
in the rate
function is used to group data points, not to specify the time range from which data is fetched.
To address this issue, subqueries come into play.
Using Subqueries
Subqueries allow you to perform an instant query over a specified range and resolution, effectively converting an instant vector into a range vector. The general format for a subquery is:
$ <instant_query>[<range>:<resolution>] [offset <duration>]
For example, if you want to calculate the rate for http_requests_total
using a 1-minute sample range and then look at the data over the past 5 minutes with a resolution of 30 seconds, you would write:
$ rate(http_requests_total[1m])[5m:30s]
In this query:
1m
is the sample range used inside therate
function.5m
is the query range, representing how far back the query retrieves data.30s
is the query step, defining the interval between the returned data samples.
This subquery returns a range vector that can then be passed to functions like max_over_time
. For example:
$ max_over_time(rate(http_requests_total[1m])[5m:30s])
This expression calculates the maximum rate of http_requests_total
for the last five minutes using one-minute sampling intervals and data points every 30 seconds.
Example: Grouping Data Without Aggregation
Consider a subquery without an aggregation function. The following query demonstrates how to group CPU seconds data over a 2-minute range with 10-second intervals:
$ rate(node_cpu_seconds_total[1m])[2m:10s]
0.991777777804683 @1664226130
0.991777777804683 @1664226140
0.991333333304358 @1664226150
0.991111111131807 @1664226160
0.991111111131807 @1664226170
0.991555555556851 @1664226180
0.989555555561764 @1664226190
0.989555555561764 @1664226200
0.989333333334747 @1664226210
0.989333333334747 @1664226220
0.989333333334747 @1664226230
0.991111111131807 @1664226240
Here, the 2m
range defines the period over which data is collected, while 10s
sets the interval between each data point. This grouping is effective for performing further aggregation operations.
Note
Subqueries convert an instant vector into a range vector, enabling aggregation functions such as max_over_time
to operate over multiple data points.
Practical Example: Analyzing Network Traffic
Suppose we have a metric called node_network_transmit_bytes_total
that tracks the total transmitted bytes on a network interface. Using the rate
function with a 1-minute interval gives the current transmission rate:
$ rate(node_network_transmit_bytes_total{instance="192.168.1.168:9100", job="node"})
{device="enp0s3", instance="192.168.1.168:9100", job="node"} 1541.444444444443
{device="enp0s8", instance="192.168.1.168:9200", job="node"} 1549.7555555555555
{device="lo", instance="192.168.1.168:9200", job="node"} 0
However, if you attempt to find the highest transmission rate over the past five hours by wrapping the rate
function directly in max_over_time
, you will encounter an error because max_over_time
expects a range vector:
$ max_over_time(rate(node_network_transmit_bytes_total[1m]))
Error executing query: invalid parameter "query": 1:15: parse error: expected type range vector in call to function "max_over_time", got instant vector
To resolve this, you can use a subquery to produce a range vector. For example, specify a 5-hour query window with a 30-second step:
$ max_over_time(rate(node_network_transmit_bytes_total[1m])[5h:30s])
Running this query computes the maximum value over time correctly. To understand what the subquery returns without wrapping it in max_over_time
, consider the following example:
$ rate(node_network_transmit_bytes_total[1m])[5h:30s]
{device="enp0s3", instance="192.168.1.168:9100", job="node"}
{device="enp0s3", instance="192.168.1.168:9200", job="node"}
{device="lo", instance="192.168.1.168:9200", job="node"}
11346.858393314682
1595.444444444443
1674.7700102217677
0
Without an aggregation function like max_over_time
, the subquery returns multiple data points across the specified range. In contrast, a query that fetches only the most recent data point may yield results similar to:
$ rate(node_network_transmit_bytes_total[1m])
{device="enp0s3", instance="192.168.1.168:9100", job="node"}
{device="enp0s3", instance="192.168.1.168:9200", job="node"}
{device="lo", instance="192.168.1.168:9000", job="node"}
{device="lo", instance="192.168.1.168:9200", job="node"}
1556.911111111111
1567.6
0
0
This query retrieves current values instead of a historical range. By using a subquery with a 5-hour window and a 30-second step, you obtain a series of data points that provide deeper insights over time.
Note
Subqueries are a powerful feature for converting an instant vector into a range vector, making them suitable for comprehensive, time-based aggregation scenarios.
Watch Video
Watch video content