MySQL Forums
Forum List  »  MyISAM

Re: Range Access for Multiple-Part Indexes
Posted by: Derek Tsui
Date: January 01, 2010 03:52PM

The only index I have is the primary key.

mysql> SHOW CREATE TABLE tbl\G;
*************************** 1. row ***************************
       Table: tbl
Create Table: CREATE TABLE `tbl` (
  `x` smallint(6) NOT NULL,
  `y` smallint(6) NOT NULL,
  `year` smallint(5) unsigned NOT NULL,
  `month` tinyint(3) unsigned NOT NULL,
  `value` float NOT NULL,
  PRIMARY KEY  (`x`,`y`,`year`,`month`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

mysql> SHOW TABLE STATUS LIKE 'tbl'\G;
*************************** 1. row ***************************
           Name: tbl
         Engine: MyISAM
        Version: 10
     Row_format: Fixed
           Rows: 17786304
 Avg_row_length: 12
    Data_length: 213435648
Max_data_length: 3377699720527871
   Index_length: 332545024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2009-12-18 13:08:04
    Update_time: 2009-12-20 18:29:33
     Check_time: 2009-12-20 01:49:50
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:

"BETWEEN"
mysql> EXPLAIN SELECT x, y, AVG(value) AS value FROM 
(SELECT x, y, month, AVG(value) AS value FROM tbl 
WHERE x BETWEEN 30 AND 38 
AND y BETWEEN 47 AND 52 
AND year BETWEEN 1961 AND 1990 
AND month IN (1,2,4,5,7,9,11,12) GROUP BY x, y, month) AS DATA GROUP BY x, y\G;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 648
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: tbl
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 1091881
        Extra: Using where; Using temporary; Using filesort
2 rows in set (1 min 3.39 sec)

"IN()"
mysql> EXPLAIN SELECT x, y, AVG(value) AS value FROM 
(SELECT x, y, month, AVG(value) AS value FROM tbl 
WHERE x IN (30,31,32,33,34,35,36,37,38) 
AND y IN (47,48,49,50,51,52) 
AND year IN (1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990) 
AND month IN (1,2,4,5,7,9,11,12) GROUP BY x, y, month) AS DATA GROUP BY x, y\G;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 648
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: tbl
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 7
          ref: NULL
         rows: 19440
        Extra: Using where; Using temporary; Using filesort
2 rows in set (0.17 sec)

mysql> SHOW VARIABLES LIKE '%buffer%';
+-------------------------------+------------+
| Variable_name                 | Value      |
+-------------------------------+------------+
| bdb_log_buffer_size           | 262144     |
| bulk_insert_buffer_size       | 8388608    |
| innodb_buffer_pool_awe_mem_mb | 0          |
| innodb_buffer_pool_size       | 268435456  |
| innodb_log_buffer_size        | 1048576    |
| join_buffer_size              | 131072     |
| key_buffer_size               | 4294967296 |
| myisam_sort_buffer_size       | 67108864   |
| net_buffer_length             | 16384      |
| preload_buffer_size           | 32768      |
| read_buffer_size              | 2097152    |
| read_rnd_buffer_size          | 8388608    |
| sort_buffer_size              | 2097152    |
+-------------------------------+------------+

In a perfect dataset, AVG(AVG) is equivalent to SUM(*)/COUNT(*). Unfortunately we don't have a perfect dataset. There may be missing data in some months of certain years. The values are highly correlated to their months, so we take an AVG grouping by months (to perserve the weight), then AVG again.

Using BETWEEN query takes roughly 60 seconds. Using IN() query takes roughly 2 seconds. IN() seems really good for short ranges but users can specify any range. In the case of a wide range, where the list contains 300+ items it is very slow.

Not sure if I can use summary tables. Users can specify any number of years and months. I forgot to mention these tables are read-only once it has been created. And each row is unique. One value per x, y, year, month.

So in EXPLAIN when primary key is being used. It is only used for the first field? Should I drop the primary key and create single field indexes?

Options: ReplyQuote


Subject
Views
Written By
Posted
4288
December 30, 2009 03:52PM
Re: Range Access for Multiple-Part Indexes
2505
January 01, 2010 03:52PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.