regr_avgx
The regr_avgx() aggregate function calculates the average of the independent variable (x) for non-null pairs of dependent (y) and independent (x) variables. This function is commonly used in linear regression analysis to compute the mean of the independent variable where both variables are not NULL.
Examples
The following example uses a simplified version of the film table from the Pagila database, containing only the title, length and rating columns. The complete schema for the film table can be found on the Pagila database website.
DROP TABLE IF EXISTS film;
CREATE TABLE film (
title text NOT NULL,
length int,
rating int
);
INSERT INTO film(title, length, rating) VALUES
('ATTRACTION NEWTON', 83, 5),
('CHRISTMAS MOONSHINE', 150, 7),
('DANGEROUS UPTOWN', 121, 4),
('KILL BROTHERHOOD', 54, 3),
('HALLOWEEN NUTS', 47, 5),
('HOURS RAGE', 122, 7),
('PIANIST OUTFIELD', 136, 7),
('PICKUP DRIVING', 77, 3),
('INDEPENDENCE HOTEL', 157, 7),
('PRIVATE DROP', 106, 4),
('SAINTS BRIDE', 125, 3),
('FOREVER CANDIDATE', 131, 7),
('MILLION ACE', 142, 5),
('SLEEPY JAPANESE', 137, 4),
('WRATH MILE', 176, 7),
('YOUTH KICK', 179, 7),
('CLOCKWORK PARADISE', 143, 5);
The following query uses the regr_avgx() function to calculate the average rating for films where both length and rating are not NULL:
SELECT
REGR_AVGX(length, rating) AS AverageRating
FROM film;
The query returns:
averagerating
-------------------
5.294117647058823
(1 row)