Range Access for Multiple-Part Indexes
Posted by:
Derek Tsui
Date: December 30, 2009 03:52PM
I am looking for an efficient method for multiple range lookups. I have many tables in the following format with 1 ~ 50 million rows of data.
Version 5.0.86. MyISAM engine. Primary index (x, y, year, month)
+-------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------------------+------+-----+---------+-------+
| x | smallint(6) | NO | PRI | NULL | |
| y | smallint(6) | NO | PRI | NULL | |
| year | smallint(5) unsigned | NO | PRI | NULL | |
| month | tinyint(3) unsigned | NO | PRI | NULL | |
| value | float | NO | | NULL | |
SELECT x, y, AVG(value) AS value FROM (SELECT x, y, month, AVG(value) AS value FROM tbl WHERE x IN (12,13,...,45,46) AND y IN (46,47,...,61,62) AND year IN (1961,1962,...,1989,1990) AND month IN (1,4,7,9) GROUP BY x, y, month) AS DATA GROUP BY x, y;
IN() is fast for small ranges but performs poorly for large ranges? (seems to be spending a lot of time in statistics state). I've read somewhere that IN() is bad when the list is long. So I tried using BETWEEN.
SELECT x, y, AVG(value) AS value FROM (SELECT x, y, month, AVG(value) AS value FROM tbl WHERE x BETWEEN 12 AND 46 AND y BETWEEN 46 AND 62 AND year BETWEEN 1961 AND 1990 AND month IN (1,4,7,9) GROUP BY x, y, month) AS DATA GROUP BY x, y;
BETWEEN seems decent for large ranges (table scan?), but performs poorly for small ranges. Not as fast as IN()
Any advice would be appreciated.