MySQL Forums
Forum List  »  Performance

Re: select count(distinct)
Posted by: Jay Pipes
Date: July 07, 2005 10:48AM

Heidi,

I did som experimenting and came up with some very weird results, similar to your own. To test, I created a MyISAM table with a single INT auto_incrementing primary key and loaded the table up with ~6 million rows. I then ran a series of SQL statements that produced identical results (though the time and the EXPLAIN output varied substantially). I have displayed the EXPLAIN results below.

One of the most interesting pieces of the puzzle is the *time* it took to generate some of these EXPLAIN results. And, also, there was an *inverse* relationship between the amount of time used for the EXPLAIN and the amount of time actually taken to run the query...

Here is the EXPLAIN output:

mysql> EXPLAIN SELECT COUNT(DISTINCT int_pk) FROM savin_test;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | SIMPLE | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
1 row in set (0.23 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM savin_test;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.09 sec)

mysql> EXPLAIN SELECT COUNT(*) FROM (SELECT 1 FROM savin_test GROUP BY int_pk) as out;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
|
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized a
way |
| 2 | DERIVED | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index
|
+----+-------------+------------+-------+---------------+---------+---------+------+---------+--------------------------
----+
2 rows in set (46.78 sec)

mysql> EXPLAIN SELECT SUM(ONE) FROM (SELECT 1 as ONE FROM savin_test GROUP BY int_pk) as out;
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6151296 | |
| 2 | DERIVED | savin_test | index | NULL | PRIMARY | 4 | NULL | 6151296 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+---------+-------------+
2 rows in set (11.20 sec)

mysql> SHOW INDEX FROM savin_test;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null |
Index_type | Comment |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
| savin_test | 0 | PRIMARY | 1 | int_pk | A | 6151296 | NULL | NULL | |
BTREE | |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+
------------+---------+
1 row in set (0.00 sec)

I included the SELECT COUNT(*) FROM savin_test just to show that (since the index is a unique key) that MyISAM optimizes away this result and retrieves the query result from the cardinality statistics on the table's primary key index.

Notice that the EXPLAIN SELECT COUNT(*) FROM (SELECT 1 FROM savin_test GROUP BY int_pk) as out; took almost 49 seconds *just to run the EXPLAIN*! Compare the EXPLAIN result to the next one, a slight variation, but nonetheless a very different result and time to EXPLAIN.

Very bizarre, IMHO. I might be tempted to report this as a bug, since all 4 of these queries should technically be able to use the same "optimized away" cardinality lookup since a unique key exists on the column in the group by/distinct clause.

Heidi, nice catch...

Jay Pipes
Community Relations Manager, North America, MySQL Inc.

Got Cluster? http://www.mysql.com/cluster
Personal: http://jpipes.com

Options: ReplyQuote


Subject
Views
Written By
Posted
97048
July 05, 2005 07:23PM
28702
July 05, 2005 07:37PM
22337
July 06, 2005 05:44AM
19228
July 07, 2005 06:13AM
17205
July 07, 2005 09:11AM
Re: select count(distinct)
17239
July 07, 2005 10:48AM
9946
July 07, 2005 10:51AM
8778
July 08, 2005 05:27AM
9888
July 08, 2005 05:48AM
8081
August 19, 2005 05:22AM
8938
July 06, 2005 02:11AM
8270
July 06, 2005 05:33AM
7285
July 06, 2005 06:13PM
7232
July 07, 2005 05:14AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.