MySQL :: Re: Optimize two queries on same table with group by and order by

New Topic

Re: Optimize two queries on same table with group by and order by

Posted by: Rick James
Date: July 23, 2011 12:25PM

That's a strange PRIMARY KEY.

First, "use temporary, use filesort" is not the villain. The villain is likely to be a poorly indexed table and/or a poorly formulated query. As we will see below, I can see some index improvements.

Let's focus on the WHERE clauses, and deduce "good" indexes:
WHERE `G` like '48%' -- good, but only as the last part of an index
AND `E` = 'bla' -- good
AND `L` <> '' -- useless becuase it is likely to select most of the table
AND `F` like '%sth%' -- useless, because of leading wild card
where `E` like 'something%' -- good, but only as the last part
and `J` = '45KM' -- good
and `Date2` > date_sub( curdate(),interval 5 week) -- good, but only as the last part
and `A` like 'U%' -- good, but only as the last part

So, these compound indexes are most likely to be useful:
For the first query:
INDEX(E, G)
For the second, one of these (can't tell which is best; you could add all, then see which is picked in EXPLAIN):
INDEX(J, E)
INDEX(J, Date2)
INDEX(J, A)
Actually having all three may be beneficial. When 'something%' is rare, (J,E) might be used; when 'U%' is rare, ... etc.

Looking at the GROUP BY and ORDER BY, only one thing looks possible:
group by `G`,`E`, `F`
This might be useful:
INDEX(`G`,`E`, `F`)
Occasionally, the optimizer will use an index that lets it avoid the filesort. But don't count on it. Sometimes it judges wrong, and makes the query slower.

> GROUP BY `A`
> ORDER BY `Date1` DESC
There _will_ be a sort, maybe two. Maybe one for the GROUP BY, definitely one for the ORDER BY. The INDEX(J,A), below, may prevent the GROUP BY's sort.

It is unusual, but not 'wrong', to index NULLable fields:
> `F` VARCHAR(255) NULL DEFAULT NULL, (etc)
Are they deliberately NULLable, or is this an oversight. If an oversight, then consider adding
MODIFY COLUMN `F` VARCHAR(255) NOT NULL,
MODIFY COLUMN `G` VARCHAR(255) NOT NULL,
MODIFY COLUMN `J` VARCHAR(255) NOT NULL,
to the ALTER, below.

For efficiency, do all the changes at once:

ALTER TABLE t1
    DROP INDEX i6, -- will replace (G) by (G,E,F)
    DROP INDEX i5, -- will replace (E) by (E,G)
    ADD INDEX EG  (E, G),
    ADD INDEX GEF (G, E, F),
    ADD INDEX JE  (J, E),
    ADD INDEX JD2 (J, Date2),
    ADD INDEX JA  (J, A);

30M rows? About 3GB of data? Maybe a similar amount of index?
Is the system tuned for your size of RAM? See
http://mysql.rjweb.org/doc.php/memory
Make those changes before the ALTER; it may help the ALTER run faster. (Still, the ALTER will take quite a while; maybe an hour.)

Validity question...
> SELECT `A` , `G`, `I`, `Date1`, `Date2`
> ...
> GROUP BY `A`
Which values of G, I and the Dates are you expecting to get? You will get a random row for each A. Did you want a longer GROUP BY? Or MAX(Date1), etc? Or maybe GROUP_CONCAT(G)?

Trivia...
30M rows puts this table at the 92nd percentile of tables discussed in these forums.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Optimize two queries on same table with group by and order by

3238

Jan D

July 22, 2011 09:10AM

Re: Optimize two queries on same table with group by and order by

1387

Rick James

July 23, 2011 12:25PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.