Deep into building a nested IF I've been chasing for several months now I hit a strange result has sent me back to the drawing board to get straight (again) how the GROUP BY function works
Sample Table "test"
id year value
1 1996 7
2 1997 6
3 1998 5
4 1999 4
5 2000 3
6 2001 2
7 2002 1
8 2003 0
A query to find the 5 year average for each year in the test table
SELECT
x.year,
AVG(y.value) AS avg,
COUNT(*) AS n_years,
GROUP_CONCAT(x.id) AS id_used
FROM test x
JOIN test y
ON x.year BETWEEN y.year AND y.year + 5
GROUP BY x.year;
Query Result
id yr value avg n_years id_used
1 1996 7 7 1 1
2 1997 6 6.5 2 2,2
3 1998 5 6 3 3,3,3
4 1999 4 5.5 4 4,4,4,4
5 2000 3 5 5 5,5,5,5,5
6 2001 2 4 5 6,6,6,6,6
7 2002 1 3 5 7,7,7,7,7
8 2003 0 2 5 8,8,8,8,8
Spelling out the result 'old school' to show what I thought I was going to get:
x.year AVG=SUM(value)/COUNT(value) x.year BETWEEN y.year AND y.year + 5 id_used
1996 (7)/1 = 7 1996 1
1997 (7+6)/2 = 6.5 1996,1997 2
1998 (7+6+5)/3 = 6 1996,1997,1998 3
1999 (7+6+5+4)/4 = 5.5 1996,1997,1998,1999 4
2000 (7+6+5+4+3)/5 = 5 1996,1997,1998,1999,2000 5
2001 (6+5+4+3+2)/5 = 4 1997,1998,1999,2000,2001 6
2002 (5+4+3+2+1)/5 = 3 1998,1999,2000,2001,2002 7
2003 (4+3+2+1+0)/5 = 2 1999,2000,2001,2002,2003 8
I think n_years is:
- a count of the number of records returned by the SELECT statement (met the BETWEEN clause)
- the denominator in the average function
- the average function using the values associated with those same records to calculate its numerator.
I thought id_used was:
- the id of all the records found for x.year
But how to explain the duplicates in id_used column in the result.
There is only one record for each x.year in the sample data?