I'm doing som statistical analyzis and need to estimate various parameters, many similar to VARIANCE but not supported by SQL directly (but for simplicity let us stick with VARIANCE). Is there a way to calculate the VARIANCE estimate without using the VARIANCE function?
I have tried:
SELECT SUM(temp-AVG(TEMP))/(COUNT(*)-1) AS the_hard_way, VARIANCE(temp) AS the_easy_way FROM temperatures GROUP BY DATE(date), city;
But it results in:
Error Code: 1111
Invalid use of group function
I use version 5.1.51;
I'll provide some test data below, cut and paste:
DROP TABLE IF EXISTS temperatures;
CREATE TABLE temperatures (date DATETIME, city VARCHAR(64), temp INT);
INSERT INTO temperatures VALUES ('2010-10-01 08:00:00', 'Paris', '18');
INSERT INTO temperatures VALUES ('2010-10-02 08:00:00', 'Paris', '17');
INSERT INTO temperatures VALUES ('2010-10-03 08:00:00', 'Paris', '19');
INSERT INTO temperatures VALUES ('2010-10-01 08:00:00', 'Stockholm', '8');
INSERT INTO temperatures VALUES ('2010-10-02 08:00:00', 'Stockholm', '7');
INSERT INTO temperatures VALUES ('2010-10-03 08:00:00', 'Stockholm', '9');
INSERT INTO temperatures VALUES ('2010-10-01 20:00:00', 'Paris', '22');
INSERT INTO temperatures VALUES ('2010-10-02 20:00:00', 'Paris', '23');
INSERT INTO temperatures VALUES ('2010-10-03 20:00:00', 'Paris', '21');
INSERT INTO temperatures VALUES ('2010-10-01 20:00:00', 'Stockholm', '12');
INSERT INTO temperatures VALUES ('2010-10-02 20:00:00', 'Stockholm', '13');
INSERT INTO temperatures VALUES ('2010-10-03 20:00:00', 'Stockholm', '11');
SELECT * FROM temperatures;
# +---------------------+-----------+------+
# | date | city | temp |
# +---------------------+-----------+------+
# | 2010-10-01 08:00:00 | Paris | 18 |
# | 2010-10-02 08:00:00 | Paris | 17 |
# | 2010-10-03 08:00:00 | Paris | 19 |
# | 2010-10-01 08:00:00 | Stockholm | 8 |
# | 2010-10-02 08:00:00 | Stockholm | 7 |
# | 2010-10-03 08:00:00 | Stockholm | 9 |
# | 2010-10-01 20:00:00 | Paris | 22 |
# | 2010-10-02 20:00:00 | Paris | 23 |
# | 2010-10-03 20:00:00 | Paris | 21 |
# | 2010-10-01 20:00:00 | Stockholm | 12 |
# | 2010-10-02 20:00:00 | Stockholm | 13 |
# | 2010-10-03 20:00:00 | Stockholm | 11 |
# +---------------------+-----------+------+
SELECT DATE(date), city, AVG(temp), VARIANCE(temp), count(*) FROM temperatures GROUP BY DATE(date), city;
# +------------+-----------+-----------+-----------+
# | DATE(date) | city | AVG(temp) | STD(temp) |
# +------------+-----------+-----------+-----------+
# | 2010-10-01 | Paris | 20.0000 | 2.0000 |
# | 2010-10-01 | Stockholm | 10.0000 | 2.0000 |
# | 2010-10-02 | Paris | 20.0000 | 3.0000 |
# | 2010-10-02 | Stockholm | 10.0000 | 3.0000 |
# | 2010-10-03 | Paris | 20.0000 | 1.0000 |
# | 2010-10-03 | Stockholm | 10.0000 | 1.0000 |
# +------------+-----------+-----------+-----------+
# This does not work
#SELECT DATE(date), city, AVG(temp), SUM(POW(temp - AVG(temp), 2))/(count(*)-1) FROM temperatures GROUP BY DATE(date), city;
Edited 1 time(s). Last edit at 10/30/2010 04:03PM by Peter Follo.