Re: Help optimizing a SELECT WHERE GROUP BY query
Posted by:
Rob Blick
Date: August 26, 2005 12:19PM
Yeah, I already tried the ANALYZE table - no help. The SHOW INDEX looks like:
mysql> show indexes from gene2word;
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| gene2word | 0 | PRIMARY | 1 | locuslink_id | A | 16028 | NULL | NULL | | BTREE | |
| gene2word | 0 | PRIMARY | 2 | tax_id | A | 16028 | NULL | NULL | | BTREE | |
| gene2word | 0 | PRIMARY | 3 | word_id | A | 9505081 | NULL | NULL | | BTREE | |
| gene2word | 1 | idxTmp | 1 | word_id | A | 89670 | NULL | NULL | | BTREE | |
| gene2word | 1 | idxTmp | 2 | tax_id | A | 155821 | NULL | NULL | | BTREE | |
| gene2word | 1 | idxTmp | 3 | locuslink_id | A | 9505081 | NULL | NULL | | BTREE | |
| gene2word | 1 | idxTmp | 4 | num_genes | A | 9505081 | NULL | NULL | | BTREE | |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
The problem seems to lurk in the huge IN clause... if I only have a dozen or so items in the IN, the PK index gets used to handle the WHERE, and then a temporary table is used for the GROUP BY:
EXPLAIN SELECT word_id,tax_id, COUNT(*), num_genes
-> FROM gene2word
-> WHERE locuslink_id in ("3091","7428",
-> "7422",
-> "112399",
-> "7157",
-> "317",
-> "1869",
-> "1871")
-> GROUP BY word_id, tax_id\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: gene2word
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 12
ref: NULL
rows: 8540
Extra: Using where; Using temporary; Using filesort
1 row in set (0.00 sec)
When 2000 or so elements are in the IN clause, the indexes are abandoned, and it takes >30s for the query to run.