Query with GROUP BY (Using index for group-by -> Using index)
Posted by:
mnf
Date: November 30, 2006 05:20AM
Hello!
I have a very simple table for testing with ~7.500000 rows and all needed keys:
+-------+----------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------------------+------+-----+---------+----------------+
| i | int(10) unsigned | NO | PRI | NULL | auto_increment |
| n | smallint(5) unsigned | NO | MUL | | |
| t | int(10) unsigned | NO | MUL | | |
| v | smallint(6) | NO | | | |
+-------+----------------------+------+-----+---------+----------------+
Typical query with GROUP BY is optimized great:
mysql> EXPLAIN SELECT MAX(i) FROM shn1 GROUP BY n;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | shn1 | range | NULL | ni | 2 | NULL | 216 | Using index for group-by |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
mysql> SELECT MAX(i) FROM shn1 GROUP BY n;
...
215 rows in set (0.00 sec)
Now I want all row (i, n, t, v) with MAX(i), I tried this:
1) subquery
mysql> EXPLAIN SELECT i, n, t, v FROM shn1 WHERE i IN (SELECT MAX(i) FROM shn1 GROUP BY n);
+----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | shn1 | ALL | NULL | NULL | NULL | NULL | 7699429 | Using where |
| 2 | DEPENDENT SUBQUERY | shn1 | index | NULL | ni | 6 | NULL | 7699429 | Using index |
+----+--------------------+-------+-------+---------------+------+---------+------+---------+-------------+
As you see it is very bad, it uses DEPENDENT QUERY (btw, how to override?) and ALL scan and of course it takes hours.
2) subquery
mysql> EXPLAIN SELECT i, n, t, v FROM shn1 AS T1, (SELECT MAX(i) AS ii FROM shn1 GROUP BY n) AS T2 WHERE T1.i = T2.ii;
+----+-------------+------------+--------+---------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+-------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 215 | |
| 1 | PRIMARY | T1 | eq_ref | PRIMARY,in,it | PRIMARY | 4 | T2.ii | 1 | |
| 2 | DERIVED | shn1 | index | NULL | ni | 6 | NULL | 7699429 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+-------+---------+-------------+
mysql> SELECT i, n, t, v FROM shn1 AS T1, (SELECT MAX(i) AS ii FROM shn1 GROUP BY n) AS T2 WHERE T1.i = T2.ii;
...
215 rows in set (14.26 sec)
Also not good.
3) join
mysql> EXPLAIN SELECT MAX(T1.i), T2.n, T2.t, T2.v FROM shn1 AS T1 LEFT JOIN shn1 AS T2 ON T2.i = T1.i GROUP BY T1.n;
+----+-------------+-------+-------+---------------+------+---------+---------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+---------+---------+-------------+
| 1 | SIMPLE | T1 | index | NULL | ni | 6 | NULL | 7699429 | Using index |
| 1 | SIMPLE | T2 | ref | PRIMARY,in,it | it | 4 | br.T1.i | 1 | |
+----+-------------+-------+-------+---------------+------+---------+---------+---------+-------------+
mysql> SELECT MAX(T1.i), T2.n, T2.t, T2.v FROM shn1 AS T1 LEFT JOIN shn1 AS T2 ON T2.i = T1.i GROUP BY T1.n;
...
215 rows in set (6 min 21.77 sec)
Still slow, it uses index with 7699429 rows instead of original index for group-by with 216 rows.
4) temporary table
mysql> CREATE TEMPORARY TABLE T SELECT MAX(i) FROM shn1 GROUP BY n;
Query OK, 215 rows affected (14.02 sec)
This is very surprising, I'm almost sure it uses "index with 7699429 rows" again - time is very close to
mysql> EXPLAIN SELECT * FROM (SELECT MAX(i) FROM shn1 GROUP BY n) AS T1;
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 215 | |
| 2 | DERIVED | shn1 | index | NULL | ni | 6 | NULL | 7699429 | Using index |
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
mysql> SELECT * FROM (SELECT MAX(i) FROM shn1 GROUP BY n) AS T1;
...
215 rows in set (13.86 sec)
All this is very strange, seems it doesnt want to use "index for group-by" in any combination, including CREATE TABLE, INSERT INTO etc. Please help me understand.
Thank you in advance,
Michael
---
P.S. I'm playing with MySQL 5.0.27-Debian_1-log (same thing with 5.0.22), analyzed, optimized tables.