MySQL Forums
Forum List  »  Performance

Re: Using 'Order By' disables use of index
Posted by: Harrison Fisk
Date: January 11, 2005 07:37PM

Jason Winnebeck wrote:
> Harrison Fisk wrote:
> > The reason is that when it sorts data on disk
> it
> > isn't sorting on disk. It sorts in memory
> and
> > just uses the disk for a temporary scratch
> pad.
> > It only does continous writes and reads to
> disk
> > with filesort.
> >
> > Even in the worst case scenario, the default
> > sort_buffer_size will cut down on disk seeks
> a
> > huge amount. Feel free to test it by using
> a
>
> Yeah, I suppose I tried to infer too much about
> how MySQL works based on our embedded system where
> I've had to be a lot more "hands on" on what's
> going on. After making my post I thought more
> about it and realized that it makes sense for
> "real" servers like our "real" server, which has
> 2GB of memory -- if it has 512 to 1024MB of sort
> buffer, that can actually fit most tables so that
> no writing of temporary storage is required, and
> -- you are right -- this removes seeks entirely.
> Even if it needs temporary space, it can sort in
> chunks of 512 to 1024MB which is very efficient.
>
> The sort_buffer_size on our embedded server...
> well let's just say it is a lot smaller than our
> table: 64k :(. We can actually expand this quite
> greatly if we need to in the future though as
> MySQL is only use 5MB of total memory.

The default sort_buffer_size is actually only 2M. So a sort_buffer of 512M of ram would be _extremely_ large to say the least. The largest I have seen it in production is about 32M, though it is fairly common to set it larger on a session basis with SET SESSION sort_buffer_size if you know you are going to be sorting a huge amount of data.

> > Also from 7.4.5:
> >
> > "Sometimes MySQL will not use an index, even
> if
> > one is available. One way this occurs is when
> the
>
> Oops, yeah I see that, you are right.

The indexes chapter is quite large as there is a lot to it, I don't blame you for missing it.

> > It is possible you could change a few lines
> of
> > code in the MySQL source base in order to
> change
> > this and make disk seeks cheaper. I'm not
> sure
> > where this is off hand though.
>
> Our query interface is quite limited and our goal
> to external clients really isn't much more than
> streaming out data by primary key with only limits
> on the primary key parts. To give a little more
> context we are sampling data from a wide range of
> sources, and data always comes in order (since we
> are sampling and time only goes forward), and our
> goal is only to download these samples. We could
> have implemented a circular file system (ie a
> circular array on disk) but we wanted to try MySQL
> first because we'd have to have a separate file
> for each data source, and it would be difficult to
> grow and shrink dynamically the file for each
> source -- MySQL does this automatically and also
> allows us SQL syntax. I've been surprised how
> well MySQL has handled a 1GB database on an
> embedded system without too much trouble (this
> problem with the index is about the worst we've
> faced, and this isn't all that bad -- kudos to
> MySQL).

It is hard to adapt a piece of software to all possible situations which is why you ran into the ORDER BY thing. MySQL doesn't expect you to have such a low seek time so it didn't work out so well in this case. Glad to hear that it works for most other cases though! Do you think you are going to stick with using MySQL in this case?

> > > We might have to do this anyway:
> > > we've also noticed that if we pull out
> > records
> > > using "in": (SELECT * FROM table where
> > key_part1
> > > IN ( a, b, c ) and key_part2 between y
> and z
> > order
> > > by key_part1, key_part2) that it does a
> sort
> > > (which requires memory and time), and it
> is
> > faster
> > > for us and does not require the extra
> memory
> > if we
> > > sort the "IN" list, break it into one
> query
> > for
> > > each key_part1 and append the results of
> the
> > > queries.
> >
> > That is odd. MySQL internally does roughly
> the
> > same thing as you are saying. An example
> explain
> > shows me that too.
>
> I apologize. I believe my initial explain came
> out the same way, but I was trying to determine
> experimentally if MySQL was using disk as it is so
> important to us. The way I did this was to run
> the query in the command line client and see if
> results returned immediately and if disk space was
> decreasing. When I did this test, I must have
> forgotten to start the mysql command client with
> the -q command.
>
> I re-ran my tests, and according to explain,
> "mysql -q" and disk usage, using queries of the
> form
> SELECT * FROM Table WHERE key_part1 IN ( a, b, c,
> ... ) ORDER BY key_part1, key_part2;
> was able to stream from the server. The [] part
> is optional (it has no effect on streaming)
>
> So your statement that it should work fine is
> correct.

Glad to hear!

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Using 'Order By' disables use of index
2024
January 11, 2005 07:37PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.