MySQL Forums
Forum List  »  Optimizer & Parser

why can character set of string literal cause different optimazation?
Posted by: Jerry Roletter
Date: April 15, 2010 04:27PM

I posted this under the performance forum but maybe it is better suited for the optimizer forum since I think the optimizer is where I am seeing the issue. I'll paste my original posting below.

The basic issue I'm seeing is that if I change the character set on a string literal I can cause a query to take 90+ seconds as opposed to milliseconds because a filesort is done in one case and not the other.

I'll spare the entire query. The basic difference is (bad performance):

WHERE action = 'mystring'

versus (good performance)

WHERE action = _latin1 'mystring'

The database and columns are UTF-8. "action" above is a varchar(255).

What might cause the optimizer to do a filesort in one case but not the other? Am I on the hairy edge of some system level setting where maybe if I increased sort_buffer_size or maybe max_length_for_sort_data (the select is quite wide - thousands of bytes with over 8 million rows) or some other variable I could avoid the "_latin1" above?

I found this because my character_set_connection was latin1 when on the server debugging a cgi whereas the cgi was using UTF-8.

Any help would be greatly appreciated.

Original post in "Performance" forum is below.


******* Original post **************

I was running a cgi and saw that I was getting poor performance (90+ seconds). I then ran the same query from within the mysql client on the server and got great performance (milliseconds).

I couldn't figure out what was different about the connection the cgi was making and the one I had and then I realized that the character set was different for the connection. The cgi was using UTF-8 (what the database and all columns in the table are using) for character_set_connection but my setting was 'latin1'.

Could someone please give me some guidance as to what may be going on here? It appears to be using the same index in both cases. The database has over 8 million rows and the select is quite wide (lots of varchar and could potentially be close to 10,000 bytes). This query is doing an ORDER BY with a limit of 25 but the result set is over 8 million.

When I look at "explain" for both settings the one with UTF-8 is causing a filesort and the one with latin1 is not. I can't figure out what exactly would cause that. The query essentially is doing:

WHERE action='some string'

and action is varchar(255) with utf8_bin collation.

Any insight would be greatly appreciated.


Options: ReplyQuote

Written By
why can character set of string literal cause different optimazation?
April 15, 2010 04:27PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.