The only index I have is the primary key.
mysql> SHOW CREATE TABLE tbl\G;
*************************** 1. row ***************************
Table: tbl
Create Table: CREATE TABLE `tbl` (
`x` smallint(6) NOT NULL,
`y` smallint(6) NOT NULL,
`year` smallint(5) unsigned NOT NULL,
`month` tinyint(3) unsigned NOT NULL,
`value` float NOT NULL,
PRIMARY KEY (`x`,`y`,`year`,`month`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
mysql> SHOW TABLE STATUS LIKE 'tbl'\G;
*************************** 1. row ***************************
Name: tbl
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 17786304
Avg_row_length: 12
Data_length: 213435648
Max_data_length: 3377699720527871
Index_length: 332545024
Data_free: 0
Auto_increment: NULL
Create_time: 2009-12-18 13:08:04
Update_time: 2009-12-20 18:29:33
Check_time: 2009-12-20 01:49:50
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
"BETWEEN"
mysql> EXPLAIN SELECT x, y, AVG(value) AS value FROM
(SELECT x, y, month, AVG(value) AS value FROM tbl
WHERE x BETWEEN 30 AND 38
AND y BETWEEN 47 AND 52
AND year BETWEEN 1961 AND 1990
AND month IN (1,2,4,5,7,9,11,12) GROUP BY x, y, month) AS DATA GROUP BY x, y\G;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 648
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: tbl
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 7
ref: NULL
rows: 1091881
Extra: Using where; Using temporary; Using filesort
2 rows in set (1 min 3.39 sec)
"IN()"
mysql> EXPLAIN SELECT x, y, AVG(value) AS value FROM
(SELECT x, y, month, AVG(value) AS value FROM tbl
WHERE x IN (30,31,32,33,34,35,36,37,38)
AND y IN (47,48,49,50,51,52)
AND year IN (1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990)
AND month IN (1,2,4,5,7,9,11,12) GROUP BY x, y, month) AS DATA GROUP BY x, y\G;
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 648
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DERIVED
table: tbl
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 7
ref: NULL
rows: 19440
Extra: Using where; Using temporary; Using filesort
2 rows in set (0.17 sec)
mysql> SHOW VARIABLES LIKE '%buffer%';
+-------------------------------+------------+
| Variable_name | Value |
+-------------------------------+------------+
| bdb_log_buffer_size | 262144 |
| bulk_insert_buffer_size | 8388608 |
| innodb_buffer_pool_awe_mem_mb | 0 |
| innodb_buffer_pool_size | 268435456 |
| innodb_log_buffer_size | 1048576 |
| join_buffer_size | 131072 |
| key_buffer_size | 4294967296 |
| myisam_sort_buffer_size | 67108864 |
| net_buffer_length | 16384 |
| preload_buffer_size | 32768 |
| read_buffer_size | 2097152 |
| read_rnd_buffer_size | 8388608 |
| sort_buffer_size | 2097152 |
+-------------------------------+------------+
In a perfect dataset, AVG(AVG) is equivalent to SUM(*)/COUNT(*). Unfortunately we don't have a perfect dataset. There may be missing data in some months of certain years. The values are highly correlated to their months, so we take an AVG grouping by months (to perserve the weight), then AVG again.
Using BETWEEN query takes roughly 60 seconds. Using IN() query takes roughly 2 seconds. IN() seems really good for short ranges but users can specify any range. In the case of a wide range, where the list contains 300+ items it is very slow.
Not sure if I can use summary tables. Users can specify any number of years and months. I forgot to mention these tables are read-only once it has been created. And each row is unique. One value per x, y, year, month.
So in EXPLAIN when primary key is being used. It is only used for the first field? Should I drop the primary key and create single field indexes?