MySQL Forums
Forum List  »  Optimizer & Parser

SELECT DISTINCT vs UNIQUE KEY
Posted by: Martin Feng
Date: July 19, 2006 06:11AM

This is a thread moved from MyISAM forum. I was suggested to post it here.

------ The original post ------

I have a query that return around 15000 VARCHAR(255). I tried 'SELECT DISTINCT' to remove duplicates from the result set, and 'show processlist' shows mysql is using more than 15 minutes to 'remove duplicates'. I killed it before it ends.

Then I tried another way:

CREATE TEMPORARY TABLE TBL_TMPNAMES
(
COL_NAME VARCHAR(255) NOT NULL,
PRIMARY KEY (COL_NAME)
) ENGINE = MyISAM;

and use 'INSERT INGORE TBL_TMPNAMES SELECT ...'

The whole INSERT statement ends in 51 seconds, in which the select itself takes about 40 seconds.

Why does this happen?

My environment:

MySQL 5.0.22, installed from MySQL-server-standard-5.0.22-0.rhel4.i386.rpm and MySQL-Max-5.0.22-0.i386.rpm,
OS: CentOS 4.3
DB: CHARACTER SET utf8 COLLATE utf8_general_ci
All tables involved are MyISAM.

------ Reply from Ingo Strüwing ------

For more exact answers on this I suggest to ask in the "Optimizer" forum again.

I can just guess that SELECT DISTINCT orders the result set to eliminate duplicates. The result set and/or the sort files might not fit into memory (tmp_table_size, sort_buffer_size).

The INSERT SELECT algorithm does not require to store the whole result set. Every row can immediately be checked against the unique index.

It would be nice if the optimizer used this algorithm for SELECT DISTINCT implicitly, but seemingly it does not.

------ Mre information from me ------

I set

tmp_table_size=512M
myisam_sort_buffer_size=512M

in my my.cnf. The machine has 2GB RAM is mainly used by one update thread so it seldom swap.

Options: ReplyQuote


Subject
Views
Written By
Posted
SELECT DISTINCT vs UNIQUE KEY
10278
July 19, 2006 06:11AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.