MySQL :: Re: Query Optimization

New Topic

Re: Query Optimization - order by

Posted by: Rick James
Date: January 26, 2012 01:22AM

OK, let's dig into
1. SELECT id, count(*) FROM table GROUP BY id ORDER BY count (*) DESC LIMIT 50
and
2. SELECT id, count(*) FROM table GROUP BY id LIMIT 50

Start with #2. Let's look at the index on id. Perhaps it is the PRIMARY KEY; if so, then remember that a PRIMARY KEY is an "index".

EXPLAIN SELECT id, count(*) FROM table GROUP BY id LIMIT 50;
It will probably say "Using index". This means that it can get the answer by looking only at the INDEX. EXPLAIN #1; it will say almost the same thing. But it will also say "using temporary; filesort". (OK, I'm guessing; shoot me if I'm wrong.) We'll get back to that difference in a minute.

Still focusing on #2, let's imagine what it has to do to get the first 50 rows of the resultset. It walks through the ids, counting how many rows there are for each id. It knows that all the rows for a given id are adjacent, so it can effectively dispense with each id as it goes. Once it has found 50, it is finished.

Now let's look at #1. It needs to
1. count the number of rows for each and every id.
2. as they are found, the id-count pairs are written to a temporary table
3. sort the temp table, highest count first.
4. deliver the first 50.

Perhaps the most costly part is step 1. That requires reading the entire index. Close behind that is step 2, which involves writing a large number of rows to either a MEMORY table or a MyISAM table. Step 3 is probably worse -- it has to sort that temp table. Finally is the relatively trivial step 4.

A question... Usually the name `id` is for the PRIMARY KEY column. Is that the case here? A PRIMARY KEY is, by MySQL's definition, UNIQUE. This implies that all your COUNT(*)s will be 1. This makes the original query rather strange.

> creates the count(*) field that is NOT indexed and therefore takes A LOT longer to sort then a usual indexed field would
Don't be fooled -- it takes effort to insert into a table that has a "clustered index" forcing the items to be inserted at the 'right' place. It can be slower to do that than to build the table unordered and then sort it.

Back to your main question -- how to make reading million rows take less than 3 seconds. You probably cannot.
OK, you say, how can I get my 50 values without having to read a million rows?
Now, I need to dig deeper into your application, the meaning of `id`, the way and frequency at which rows are added (or changed) to your million-row table, etc. And, how often do you need to run this SELECT?
Given answers to those, I may direct you toward a "summary table", updating counts on the fly, using a trigger, caching, or something totally other.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

Query Optimization - order by

matt erop

January 24, 2012 08:00AM

Re: Query Optimization - order by

Rick James

January 26, 2012 01:22AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.