Skip to content

Aggregate functions

Aggregate functions process a set of values to derive a singular outcome.

However, count(), count_if(), max_by(), min_by(), and approx_distinct() differ in that they take null values into account. For all other aggregate functions, null values are disregarded, and when there are no input rows or all values are null, they return null. For instance, sum()yields null instead of zero, and avg() excludes null values from the count. To transform null into zero, the coalesce function can be applied.

General aggregate functions

count()

Function Description Return Type
count(*) Returns the number of input rows. bigint
count(x) Returns the number of non-null input values. bigint
count_if(x) Returns the number of TRUE input values. This function is equivalent to count(CASE WHEN x THEN 1 END). bigint

avg()

Function Description Return Type
avg(x) Computes the average (arithmetic mean) of all input values. same as input(numeric)
avg(time interval type) Calculates the average interval length of all input values. same as input(interval)

Example:

SELECT avg(time_column) as average_interval
FROM your_table;

bool()

Function Description Return Type
bool_and(boolean) Yields TRUE only if every input value is TRUE; otherwise, FALSE. boolean
bool_or(boolean) Yields TRUE if at least one input value is TRUE; otherwise, FALSE. boolean

every()

Function Description Return Type
every(boolean) Acts as an alias for bool_and(). boolean

array_agg()

Function Description Return Type
array_agg(x) Generates an array containing the input x elements. array<[same as input]>

checksum()

Function Description Return Type
checksum(x) Computes an order-insensitive checksum for the provided values. varbinary

geometric_mean()

Function Description Return Type
geometric_mean(x) Computes the geometric mean of all input values. double

listagg()

Function Description Return Type
listagg(x, separator) Concatenates input values, separated by the specified separator string. varchar

max()

Function Description Return Type
max(x) Computes the maximum value among all input values. <same as input>
max(x, n) Retrieves the n largest values from all input values of x. array<[same as input]>

max_by()

Function Description Return Type
max_by(x, y) Obtains the value of x associated with the maximum value of y. <same as x>
max_by(x, y, n) Retrieves n values of x associated with the n largest values of y. array<[same as x]>

min()

Function Description Return Type
min(x) Computes the minimum value among all input values. <same as input>
min(x, n) Retrieves the n smallest values from all input values of x. array<[same as input]>

min_by

Function Description Return Type
min_by(x, y) Obtains the value of x associated with the minimum value of y. <same as x>
min_by(x, y, n) Retrieves n values of x associated with the n smallest values of y in ascending order of y. array<[same as x]>

sum()

Function Description Return Type
sum(x) Computes the sum of all input values. <same as input>

Bitwise aggregate functions

Function Description Return Type
bitwise_and_agg(x) Computes the bitwise AND of all input values in 2’s complement representation. <same as input>
bitwise_or_agg(x) Calculates the bitwise OR of all input values in 2’s complement representation. <same as input>

Map aggregate functions

histogram()

Function Description Return Type
histogram(x) Generates a map containing the count of occurrences for each input value. map<K,bigint>

map_agg()

Function Description Return Type
map_agg(key, value) Generates a map created from the input key/value pairs. map<key, value>

map_union()

Function Description Return Type
map_union(x(K, V)) Obtains the union of all input maps. If a key is present in multiple input maps, the value in the resulting map is selected arbitrarily from one of the input maps. map<K, V>

multimap_agg()

Function Description Return Type
multimap_agg(key, value) Generates a multimap created from the input key/value pairs. Each key can be associated with multiple values. multimap<key, value>

Approximate aggregate functions

approximate_distinct()

Function Description Return Type
approx_distinct(x) Returns the approximate count of distinct non-null values in the input set x. Zero is returned if all input values are null. The standard error is approximately 2.3%, providing an approximation of count(DISTINCT x). bigint
approx_distinct(x, e) Returns the approximate count of distinct non-null values in the input set x. Zero is returned if all input values are null. The standard error is no more than e, with e in the range of [0.0040625, 0.26000]. bigint

approx_most_frequent()

Function Description Return Type
approx_most_frequent(buckets, value, capacity) Approximates the top frequent values up to a specified number, balancing memory efficiency and accuracy. map<[same as value], bigint>

approx_percentile()

Function                               Description Return Type
approx_percentile(x, percentage) Computes the approximate percentile for all input values of x at the given percentage. percentage must be between 0 and 1, constant for all rows. [same as x]
approx_percentile(x, percentages) Calculates the approximate percentile for all input values of x at each specified percentage in the array. Each element in percentages must be between 0 and 1, constant for all rows. array<[same as x]>
approx_percentile(x, w, percentage) Computes the approximate weighted percentile for all input values of x using per-item weight w at the given percentage. Weights >= 1, percentage between 0 and 1, constant for all rows. [same as x]
approx_percentile(x, w, percentages) Calculates the approximate weighted percentile for all input values of x using per-item weight w at each specified percentage in the array. Weights >= 1, each element in percentages between 0 and 1, constant for all rows. array<n [same as x]>

approx_set()

Function Description Return Type
approx_set(x) Generates a HyperLogLog structure for the given input values x. hll

merge()

Function Description Return Type
merge(x) Merges HyperLogLog structures. hll
merge(qdigest(T)) Merges Quantile digest structures. qdigest(T)
merge(tdigest) Merges T-Digest structures. tdigest

numeric_histogram()

Function Description Return Type
numeric_histogram(buckets, value) Computes an approximate histogram with up to buckets number of buckets for all values. map<double, double>
numeric_histogram(buckets, value, weight) Computes an approximate histogram with up to buckets number of buckets for all values with a per-item weight of weight. map<double, double>

qdigest_agg()

A quantile digest is a data sketch that stores approximate percentile information. It takes a parameter which must be one of bigint, double, or real which represent the set of numbers that may be ingested by the qdigest. They may be merged without losing precision, and for storage and retrieval, they may be cast to/from VARBINARY.

Function Description Return Type
qdigest_agg(x) Computes a quantile digest (qdigest) for values in x. qdigest([same as x])
qdigest_agg(x, w) Computes a quantile digest (qdigest) for values in x with per-item weights w. qdigest([same as x])
qdigest_agg(x, w, accuracy) Computes a quantile digest (qdigest) for values in x with per-item weights w and specified accuracy. qdigest([same as x])

tdigest_agg()

A T-digest is a data sketch specifically crafted to store approximate percentile information.

Function Description Return Type
tdigest_agg(x) Computes a T-Digest for values in x. tdigest
tdigest_agg(x, w) Computes a T-Digest for values in x with per-item weights w. tdigest

Statistical aggregate functions

corr()

Function Description Return Type
corr(y, x) Returns the correlation coefficient of input values y and x. double

covar_pop()

Function Description Return Type
covar_pop(y, x) Returns the population covariance of input values y and x. double

covar_samp()

Function Description Return Type
covar_samp(y, x) Returns the sample covariance of input values y and x. double

kurtosis()

Function Description Return Type
kurtosis(x) Returns the excess kurtosis of all input values. It provides an unbiased estimate using a specific mathematical expression. double

regr_intercept()

Function Description Return Type
regr_intercept(y, x) Returns the linear regression intercept of input values. y is the dependent value, and x is the independent value. double

regr_slope()

Function Description Return Type
regr_slope(y, x) Returns the linear regression slope of input values. y is the dependent value, and x is the independent value. double

skewness()

Function Description Return Type
skewness(x) Returns Fisher’s moment coefficient of skewness for all input values. double

stddev()

Function Description Return Type
stddev(x) Alias for stddev_samp(). double

stddev_pop()

Function Description Return Type
stddev_pop(x) Returns the population standard deviation of all input values. double

stddev_samp()

Function Description Return Type
stddev_samp(x) Returns the sample standard deviation of all input values. double

variance()

Function Description Return Type
variance(x) Alias for var_samp(). double

var_pop()

Function Description Return Type
var_pop(x) Returns the population variance of all input values. double

var_samp()

Function Description Return Type
var_samp(x) Returns the sample variance of all input values. double

Lambda aggregate functions

Function Description Return Type
reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) Reduces all input values into a single value. inputFunction is invoked for each non-null input value, taking the input value and the current state (initially initialState), and returning the new state. combineFunction is invoked to combine two states into a new state. The final state is returned. The state type (S) must be boolean, integer, floating-point, or date/time/interval. <Same as initialState>

Example:

-- Creating a sample table
CREATE TABLE sales (
    product_id INT,
    quantity_sold INT
);

-- Inserting sample data
INSERT INTO sales VALUES
    (1, 10),
    (1, 15),
    (2, 5),
    (2, 8),
    (2, 7);

-- Using reduce_agg to calculate the total quantity sold for each product
SELECT product_id, reduce_agg(quantity_sold, 0, (a, b) -> a + b, (a, b) -> a + b) AS total_quantity_sold
FROM sales
GROUP BY product_id;

The output will show the total quantity sold for each product:

 product_id | total_quantity_sold |
|------------|----------------------|
| 1          | 25                   |
| 2          | 20                   |

Ordering during aggregation

Some aggregate functions such as array_agg() produce different results depending on the order of input values. This ordering can be specified by writing an ORDER BY clause within the aggregate function:

Example Description
array_agg(x ORDER BY y DESC) Aggregates x values ordered by y in descending order.
array_agg(x ORDER BY x, y, z) Aggregates x values ordered by x, y, and z.

Filtering during aggregation

The FILTER keyword allows you to apply a condition using a WHERE clause to remove specific rows from aggregation processing. This condition is evaluated for each row before it is used in the aggregation and can be used with any aggregate function.

Example Description
aggregate_function(...) FILTER (WHERE <condition>) Applies a condition using WHERE clause to remove specific rows from aggregation processing.
SELECT array_agg(name) FILTER (WHERE name IS NOT NULL) Removes null values from consideration in array_agg.
SELECT count(*) FILTER (WHERE petal_length_cm > 4) AS count Adds a condition on the count for Iris flowers while retaining all information.