MySQL Forums
Forum List  »  Performance

Re: Using 'Order By' disables use of index
Posted by: Harrison Fisk
Date: January 10, 2005 06:47PM

Jason Winnebeck wrote:
> Harrison Fisk wrote:
> > This isn't a bug. There are times that it
> is
> > actually slower to use an index than to not
> use
> > the index. ... The reason is
> > that it is much faster to read and write
> continous
> > chunks of data from a table than to do many
> disk
> > seeks.
>
> Hmm that sounds a little unintuitive to me that
> reading the whole file then rewriting it again to
> sort it on disk is faster than all of the seeks,
> espically when you consider the number of seeks to
> sort the data on disk. It sounds plausible though
> if you consider a large sort_buffer_size, but as
> you say that is mostly irrelevant for my
> situation.

While I admit is does sound unintuitive, it is true.

The reason is that when it sorts data on disk it isn't sorting on disk. It sorts in memory and just uses the disk for a temporary scratch pad. It only does continous writes and reads to disk with filesort.

Even in the worst case scenario, the default sort_buffer_size will cut down on disk seeks a huge amount. Feel free to test it by using a FORCE INDEX if you want to see. Here is an example showing the difference with a sort_buffer of 2M (the default).

mysql> SELECT * FROM test2 ORDER BY vehicle_id;
1000000 rows in set (20.64 sec)

mysql> SELECT * FROM test2 force index (vehicle_id) ORDER BY vehicle_id;
1000000 rows in set (24.96 sec)

You can see the first one is faster (this is repeatable).

> I think that the manual should be updated then, as
> the manual clearly states in section 7.2.10 that
> the index is used for "SELECT * FROM t1 ORDER BY
> key_part1,key_part2,... ;" and that if explain
> shows "using filesort", then it implies that
> something is wrong. It would be nice if the
> manual mentions what you've told me.

Feel free to add a comment to anything you think should be changed in the manual. The docs team reviews the comments periodically and will add in anything they think is true. One point about the EXPLAIN point though is that I saw the line:

"With EXPLAIN SELECT ... ORDER BY, you can check whether MySQL can use indexes to resolve the query. It cannot if you see Using filesort in the Extra column. See section 7.2.1 EXPLAIN Syntax (Get Information About a SELECT)."

Also from 7.4.5:

"Sometimes MySQL will not use an index, even if one is available. One way this occurs is when the optimizer estimates that using the index would require MySQL to access a large percentage of the rows in the table. (In this case, a table scan is probably much faster, because it will require many fewer seeks.) However, if such a query uses LIMIT to only retrieve part of the rows, MySQL will use an index anyway, because it can much more quickly find the few rows to return in the result."

>
> Out of curiousity, if you use alter table to sort
> the data file on disk, does that dramatically
> decrease the time to do the filesort?

Generally yes. It will still have to do a single pass on the data, but it should be faster to do so. It also will actually increase the speed of using an index for the sorting as the disk seeks and reads will be more in order.

> > So this is the way it normally works, faster
> to
> > actually not use the index (feel free to
> benchmark
> > on a normal database server (not the embedded
> case
> > that you have) if you don't believe me).
> All
> > RDBMS will do the same thing by default.
>
> The exception to this is if you use a clustered
> index, as in SQL Server, correct, which can be
> roughly achieved in MySQL by running 'alter table
> x order by p' often?

It isn't always true in MySQL with ALTER TABLE. The reason is that the "cluster" isn't maintained automatically. So it can't be used by the optimizer to make decisions as it has no information if it is still roughly in order. It will still filesort, but it will be faster to do so.

> > However in your case it might be a bit
> different
> > for a few reasons.
> >
> > 1. You are using a flash drive, the seek
> time
> > most likely isn't that big of a factor here.
>
> Yes, our seek time is 0.04ms, compared to
> something like 8-12ms, which is typical for a
> physical magnetic drive.
>
> > 2. You are using mysql_use_result to stream
> so
> > you want the first row as fast as possible.
>
> Although streaming records linearly is beneficial
> so that the client can start processing the data
> set before it is completely downloaded, that is
> not required. What is required is that we use no
> extra memory or disk as we have none to use, and
> that is why we use streaming.
>
> > Because of this you will want to override
> the
> > behavior and use a FORCE INDEX (PRIMARY).
> That
> > will cause it always use the index as you
> have
> > seen. That should work in the most recent
> 4.0 and
> > 4.1. There was a bug in early 4.0 where it
> would
> > still use a USING FILESORT (as you noted).
> The
> > bug was fixed in 4.0.19 (see
> >
> http://dev.mysql.com/doc/mysql/en/News-4.0.19.html
>
> > for details).
>
> Thanks. This will unfortunately take some extra
> logic code on our part as we have queries that use
> another index on that table, so we have to make
> sure not to use FORCE INDEX when that index is not
> applicable.

It is possible you could change a few lines of code in the MySQL source base in order to change this and make disk seeks cheaper. I'm not sure where this is off hand though.

> We might have to do this anyway:
> we've also noticed that if we pull out records
> using "in": (SELECT * FROM table where key_part1
> IN ( a, b, c ) and key_part2 between y and z order
> by key_part1, key_part2) that it does a sort
> (which requires memory and time), and it is faster
> for us and does not require the extra memory if we
> sort the "IN" list, break it into one query for
> each key_part1 and append the results of the
> queries.

That is odd. MySQL internally does roughly the same thing as you are saying. An example explain shows me that too.

mysql> explain SELECT * FROM Country WHERE region IN ('Antarctica', 'Melanesia',' Western Africa') AND population BETWEEN 1000 and 1000000 ORDER BY region, population;
+---------+-------+---------------+--------+---------+------+------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------+-------+---------------+--------+---------+------+------+-------------+
| Country | range | Region | Region | 30 | NULL | 5 | Using where |
+---------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)

Can you post your query/table structure and explain here?

Harrison Fisk, Trainer and Consultant
MySQL AB, www.mysql.com

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Using 'Order By' disables use of index
2518
January 10, 2005 06:47PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.