Heidi,
I did som experimenting and came up with some very weird results, similar to your own. To test, I created a MyISAM table with a single INT auto_incrementing primary key and loaded the table up with ~6 million rows. I then ran a series of SQL statements that produced identical results (though the time and the EXPLAIN output varied substantially). I have displayed the EXPLAIN results below.
One of the most interesting pieces of the puzzle is the *time* it took to generate some of these EXPLAIN results. And, also, there was an *inverse* relationship between the amount of time used for the EXPLAIN and the amount of time actually taken to run the query...
Here is the EXPLAIN output:
mysql> EXPLAIN SELECT COUNT(DISTINCT int_pk) FROM savin_test;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
1 row in set (0.23 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM savin_test;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.09 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM (SELECT 1 FROM savin_test GROUP BY int_pk) as out;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized a
way |
| 2 | DERIVED | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index
|
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
2 rows in set (46.78 sec)
mysql> EXPLAIN SELECT SUM(ONE) FROM (SELECT 1 as ONE FROM savin_test GROUP BY int_pk) as out;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6151296 | |
| 2 | DERIVED | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
2 rows in set (11.20 sec)
mysql> SHOW INDEX FROM savin_test;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
| savin_test | 0 | PRIMARY | 1 | int_pk | A | 6151296 | NULL | NULL | |
BTREE | |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
1 row in set (0.00 sec)
I included the SELECT COUNT(*) FROM savin_test just to show that (since the index is a unique key) that MyISAM optimizes away this result and retrieves the query result from the cardinality statistics on the table's primary key index.
Notice that the EXPLAIN SELECT COUNT(*) FROM (SELECT 1 FROM savin_test GROUP BY int_pk) as out; took almost 49 seconds *just to run the EXPLAIN*! Compare the EXPLAIN result to the next one, a slight variation, but nonetheless a very different result and time to EXPLAIN.
Very bizarre, IMHO. I might be tempted to report this as a bug, since all 4 of these queries should technically be able to use the same "optimized away" cardinality lookup since a unique key exists on the column in the group by/distinct clause.
Heidi, nice catch...
Jay Pipes
Community Relations Manager, North America, MySQL Inc.
Got Cluster?
http://www.mysql.com/cluster
Personal:
http://jpipes.com