When I use DISTINCT in an INNER JOIN query that returns more than 1 column in the SELECT a full table scan is performed rather than using an index. When 1 column is returned in the SELECT the index is used.
(1) Create the following tables:
CREATE TABLE tab1
(
col1 int NOT NULL,
col2 varchar(24) NOT NULL,
col3 int NOT NULL,
CONSTRAINT pk_tab1 PRIMARY KEY CLUSTERED (col1)
);
CREATE TABLE tab2
(
col1 int NOT NULL,
col2 varchar(24) NOT NULL,
col3 int NOT NULL,
CONSTRAINT pk_tab2 PRIMARY KEY CLUSTERED (col1, col2)
);
ALTER TABLE tab2 ADD CONSTRAINT fk_tab2 FOREIGN KEY (col1) REFERENCES tab1 (col1) ON DELETE CASCADE;
CREATE INDEX ix_tab2 ON tab2 (col2);
(2) Next run the following SELECTs using EXPLAIN:
EXPLAIN SELECT DISTINCT a.col1
FROM tab1 a INNER JOIN tab2 b
ON a.col1 = b.col1
WHERE b.col2 in ('value1', 'value2')
ORDER BY a.col1
;
+----+-------------+-------+-------+-----------------+---------+---------+----------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+---------+---------+----------------+------+----------------------------------------------+
| 1 | SIMPLE | a | index | PRIMARY | PRIMARY | 4 | NULL | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | b | ref | PRIMARY,ix_tab2 | PRIMARY | 4 | brmuser.a.col1 | 1 | Using where; Using index; Distinct |
+----+-------------+-------+-------+-----------------+---------+---------+----------------+------+----------------------------------------------+
EXPLAIN SELECT DISTINCT a.col1, a.col2
FROM tab1 a INNER JOIN tab2 b
ON a.col1 = b.col1
WHERE b.col2 in ('value1', 'value2')
ORDER BY a.col1
;
+----+-------------+-------+------+-----------------+---------+---------+----------------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------+---------+---------+----------------+------+------------------------------------+
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using temporary; Using filesort |
| 1 | SIMPLE | b | ref | PRIMARY,ix_tab2 | PRIMARY | 4 | brmuser.a.col1 | 1 | Using where; Using index; Distinct |
+----+-------------+-------+------+-----------------+---------+---------+----------------+------+------------------------------------+
The addition of the second column, a.col2, in the SELECT causes a full table scan.
Any ideas why this is? Could it be related to an existing bug:
http://bugs.mysql.com/bug.php?id=41994
Showing the execution plan of the same queries in MSSQL shows both using the PRIMARY key.
Thanks.