MySQL :: Re: Strange behavior with queries using "limit"

New Topic

Re: Strange behavior with queries using "limit"

Posted by: Jørgen Løland
Date: March 22, 2013 09:14AM

Hi Jagat,

In certain cases, MySQL will recognize that a query (subquery in your case) has a LIMIT that is lower than the estimated number of rows that must be sorted. In such cases it is usually beneficial to switch to an index that provides the necessary ordering because execution can be stopped immediately after finding the xth row. Example:

CREATE TABLE t1 (i INT, j INT, INDEX(i), INDEX(j));
INSERT INTO t1 ...
^ insert 1M rows with 10.000 distinct 'i' and 'j' values

SELECT * FROM t1 WHERE i=X AND j<Y ORDER BY j LIMIT 1;

- Assume 10.000 rows have 'i=X'
- Assume 100.000 rows have 'j<Y'

Now MySQL has to consider two execution plans:
a) 'ref' access using index i
b) 'range' access using index j

(well... more plans are possible, but for the sake of argument let's just consider these two)

It immediately looks like a) is the better choice because 10x more rows will be read using access method b). Let's see what MySQL actually has to do when using these methods:

a) read 10.000 rows through index, evaluate the "j<Y" predicate
(statistically, about 10% of the 10.000 rows will evaluate "j<Y"
to true), sort 1.000 rows on column 'j' and return the row with smallest
'j' value.
b) start reading rows through the 'j' index. For each row, evaluate the
"i=X" predicate (statistically, 1% of the rows evaluate this predicate to
true). When *the first* row that has "i=X" is found, MySQL can return it and
stop execution. That means on average ~100 rows read and filesort avoided.

Yes, that's much better. Normally...

But all this is just guesswork from the optimizer. If the statistics fits our data reasonably well it's a good choice. But MySQL has no knowledge about covariance of data, so if you're really unlucky there may not even be a single row that has "i=X". In this unlucky case 'ref' access on "i=X' would return almost instantly with "no records found". Or, more reasonably, there are 10.000 rows with "i=X", but none of these have "i<Y" (due to covariance of data). In this case a) would read 10k rows while b) would read 100k rows.

I *think* this is what happens to your query. I cannot tell for sure without a reproducible test case or, if you're using MySQL 5.6, the output of Optimizer tracing for the two queries.

Hope this helps,
Jørgen Løland
Software Engineer, MySQL, Oracle
jorgenloland.blogspot.com

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Strange behavior with queries using "limit"

2098

Jagat P

March 21, 2013 03:18PM

Re: Strange behavior with queries using "limit"

1025

Jørgen Løland

March 22, 2013 09:14AM

Re: Strange behavior with queries using "limit"

754

Rick James

March 23, 2013 12:09AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.