Problem with unicode character comparison
Posted by: Brooks Brown
Date: April 12, 2005 12:02PM

I was going to reply to Dimitry Libertas's post, as I believe mine is a similar problem to his, but I thought this would would have more emphasis if it was a separate post.

I am using utf8_unicode_ci collation and am very frustrated with the support for "expansions" (see http://dev.mysql.com/doc/mysql/en/charset-unicode-sets.html). Evidently 'a-acute' or 'a-umlaut' is interpreted as "equal" to 'a' which is generally NOT what is desired. For example, a customer in Sweden is complaining that searching on man yields matches for Människan.

Using the binary collation (utf8_bin) is not a good option, as the sorting that this would produce is not desirable.

We could programmatically weed out false matches, but this would be disruptive and unnecessarily complicate our application which supports multiple database servers.

Also, as Mr. Libertas observes, the 'like' operator works differently than '='.

Options: ReplyQuote


Subject
Views
Written By
Posted
Problem with unicode character comparison
5342
April 12, 2005 12:02PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.