MySQL Forums
Forum List  »  Performance

Using 'Order By' disables use of index
Posted by: Jason Winnebeck
Date: January 07, 2005 04:17PM

I am developing a database repository in an emedded environment with a small amount (1GB flash) of space, of which most of it is dedicated to the database.

I need to dump database data, but I am having trouble. It seems that MySQL wants to do filesort when using order by, even though I'm trying to order by ascending primary key! It creates a temporary table, but there is not enough space to sort the result set (as I'm dumping the whole table) -- even if there was the query takes some hours to execute and we need query times around 5-30 minutes or faster.

So the key requirement is that the server must be able to stream results (we use mysql_use_result), and to do this the server must not need to sort the data -- this is why we have created the indicies as we have.

SELECT * FROM Table;

Does successfully stream the data, but it is not sorted. I want to return the data in order of primary key:

SELECT * FROM Table ORDER BY pkey_part1, pkey_part2;

Then filesort is used, and the query will take hours and eventually die due to out-of-disk condition. This is shown in the EXPLAIN statement. What is really, really dumb is that if we have 7,600,000 rows in the database, then the following query:

SELECT * FROM Table ORDER BY pkey_part1, pkey_part2 LIMIT 7599999;

The query starts streaming results immediately and quickly. As long as we pick a limit less than count(*), it works. Limit with two bounaries (such as LIMIT 100, 200) uses filesort.

The MySQL Manual explicitly states in "How MySQL Optimizes Order By" that the above statement is supported, but I've tried this on two MySQL 4.0.15 servers installed at our site, a 4.0.22 server off-site that I use for my personal purposes, and a 4.1 (beta, slightly old version) server. I've even tried different table schemas to see if it is an issue with multi-column keys, and it is not.

In these cases, explain shows the use of filesort unless you have a where clause. What is really outrageous is that if you pick a range in the primary key that extends over all of the primary key, the explain says that it pulls it out using the index and then SORTS the results it got out of that. It appears that any time the DB thinks it will pull all rows it should do a table scan, even though a table scan doesn't even work.

explain select * from DMCalcNumData order by portId, `time`;
+---------------+------+---------------+------+---------+------+---------+----------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+---------------+------+---------+------+---------+----------------+
| DMCalcNumData | ALL | NULL | NULL | NULL | NULL | 7669090 | Using filesort |
+---------------+------+---------------+------+---------+------+---------+----------------+

explain select * from DMCalcNumData order by portId, `time` limit 7669089;
+---------------+-------+---------------+---------+---------+------+---------+-------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+-------+---------------+---------+---------+------+---------+-------+
| DMCalcNumData | index | NULL | PRIMARY | 10 | NULL | 7669090 | |
+---------------+-------+---------------+---------+---------+------+---------+-------+

explain select * from DMCalcNumData where portId > 0 and portId < 10000 order by portId, time;
+---------------+-------+---------------+---------+---------+------+---------+-------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+-------+---------------+---------+---------+------+---------+-------------+
| DMCalcNumData | range | PRIMARY | PRIMARY | 2 | NULL | 1106799 | Using where |
+---------------+-------+---------------+---------+---------+------+---------+-------------+

explain select * from DMCalcNumData where portId > 0 and portId < 1000000 order by portId, time;
+---------------+------+---------------+------+---------+------+---------+-----------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+---------------+------+---------+------+---------+-----------------------------+
| DMCalcNumData | ALL | PRIMARY | NULL | NULL | NULL | 7669090 | Using where; Using filesort |
+---------------+------+---------------+------+---------+------+---------+-----------------------------+

(almost all portId is greater than 10000).

The first query works, but not the second.

I thought that possibly explain was broken and said that it would use filesort but wouldn't, but I confirmed by watching disk space that it is correct, and the versions with filesort never return (crash), while the versions using the index start returning results immediately w/o use of a temp file.

I tried to use FORCE INDEX:
explain select * from DMCalcNumData FORCE INDEX (PRIMARY) order by portId, `time`;
+---------------+------+---------------+------+---------+------+---------+----------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+---------------+------+---------------+------+---------+------+---------+----------------+
| DMCalcNumData | ALL | NULL | NULL | NULL | NULL | 7669090 | Using filesort |
+---------------+------+---------------+------+---------+------+---------+----------------+

this did not work in 4.0.15, but it _does_ work for 4.1 beta, and it forces use of the primary key.

To make sure my schema was not the issue I tried this same thing on other DB schemas, and even on other servers -- I tried this on a personal webserver that a professional company maintains where I have them host a PHPBB database (ie not subject to my stupidity) -- if I tried to return all records ordered by primary key, filesort is used.

I can't possibly believe that this is a real bug that I've found that is of this magnitude as it pretty much would impact everything. While this is a performance problem for others, in the embedded system we are developing, filesort is just not feasible as our available disk space is just not there. Can anyone please help? We could upgrade our embedded system to 4.1, but even in 4.1 you have to resort to hacks just to pull out records by primary key -- this should be the most basic operation possible in a DB server!

Options: ReplyQuote


Subject
Views
Written By
Posted
Using 'Order By' disables use of index
11333
January 07, 2005 04:17PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.