MySQL Forums
Forum List  »  Newbie

What is the SQL equvalance to VARIANCE
Posted by: Peter Follo
Date: October 30, 2010 03:55PM

I'm doing som statistical analyzis and need to estimate various parameters, many similar to VARIANCE but not supported by SQL directly (but for simplicity let us stick with VARIANCE). Is there a way to calculate the VARIANCE estimate without using the VARIANCE function?

I have tried:
SELECT SUM(temp-AVG(TEMP))/(COUNT(*)-1) AS the_hard_way, VARIANCE(temp) AS the_easy_way FROM temperatures GROUP BY DATE(date), city;

But it results in:
Error Code: 1111
Invalid use of group function

I use version 5.1.51;

I'll provide some test data below, cut and paste:
DROP TABLE IF EXISTS temperatures;
CREATE TABLE temperatures (date DATETIME, city VARCHAR(64), temp INT);
INSERT INTO temperatures VALUES ('2010-10-01 08:00:00', 'Paris', '18');
INSERT INTO temperatures VALUES ('2010-10-02 08:00:00', 'Paris', '17');
INSERT INTO temperatures VALUES ('2010-10-03 08:00:00', 'Paris', '19');
INSERT INTO temperatures VALUES ('2010-10-01 08:00:00', 'Stockholm', '8');
INSERT INTO temperatures VALUES ('2010-10-02 08:00:00', 'Stockholm', '7');
INSERT INTO temperatures VALUES ('2010-10-03 08:00:00', 'Stockholm', '9');
INSERT INTO temperatures VALUES ('2010-10-01 20:00:00', 'Paris', '22');
INSERT INTO temperatures VALUES ('2010-10-02 20:00:00', 'Paris', '23');
INSERT INTO temperatures VALUES ('2010-10-03 20:00:00', 'Paris', '21');
INSERT INTO temperatures VALUES ('2010-10-01 20:00:00', 'Stockholm', '12');
INSERT INTO temperatures VALUES ('2010-10-02 20:00:00', 'Stockholm', '13');
INSERT INTO temperatures VALUES ('2010-10-03 20:00:00', 'Stockholm', '11');
SELECT * FROM temperatures;
# +---------------------+-----------+------+
# | date                | city      | temp |
# +---------------------+-----------+------+
# | 2010-10-01 08:00:00 | Paris     |   18 |
# | 2010-10-02 08:00:00 | Paris     |   17 |
# | 2010-10-03 08:00:00 | Paris     |   19 |
# | 2010-10-01 08:00:00 | Stockholm |    8 |
# | 2010-10-02 08:00:00 | Stockholm |    7 |
# | 2010-10-03 08:00:00 | Stockholm |    9 |
# | 2010-10-01 20:00:00 | Paris     |   22 |
# | 2010-10-02 20:00:00 | Paris     |   23 |
# | 2010-10-03 20:00:00 | Paris     |   21 |
# | 2010-10-01 20:00:00 | Stockholm |   12 |
# | 2010-10-02 20:00:00 | Stockholm |   13 |
# | 2010-10-03 20:00:00 | Stockholm |   11 |
# +---------------------+-----------+------+

SELECT DATE(date), city, AVG(temp), VARIANCE(temp), count(*) FROM temperatures GROUP BY DATE(date), city;
# +------------+-----------+-----------+-----------+
# | DATE(date) | city      | AVG(temp) | STD(temp) |
# +------------+-----------+-----------+-----------+
# | 2010-10-01 | Paris     |   20.0000 |    2.0000 |
# | 2010-10-01 | Stockholm |   10.0000 |    2.0000 |
# | 2010-10-02 | Paris     |   20.0000 |    3.0000 |
# | 2010-10-02 | Stockholm |   10.0000 |    3.0000 |
# | 2010-10-03 | Paris     |   20.0000 |    1.0000 |
# | 2010-10-03 | Stockholm |   10.0000 |    1.0000 |
# +------------+-----------+-----------+-----------+

# This does not work
#SELECT DATE(date), city, AVG(temp), SUM(POW(temp - AVG(temp), 2))/(count(*)-1) FROM temperatures GROUP BY DATE(date), city;



Edited 1 time(s). Last edit at 10/30/2010 04:03PM by Peter Follo.

Options: ReplyQuote


Subject
Written By
Posted
What is the SQL equvalance to VARIANCE
October 30, 2010 03:55PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.