MySQL :: Re: utf8_unicode_ci vs utf8_general

New Topic

Re: utf8_unicode_ci vs utf8_general_ci

Posted by: Alexander Barkov
Date: December 18, 2007 04:03PM

Hi,

You can check and compare sort orders provided by these two collations here:

http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

utf8_general_ci is a very simple collation. What it does - it just
- removes all accents
- then converts to upper case
and uses the code of this sort of "base letter" result letter to compare.

For example, these Latin letters: ÀÁÅåāă (and all other Latin letters "a"
with any accents and in any cases) are all compared as equal to "A".

utf8_unicode_ci uses the default Unicode collation element table (DUCET).

The main differences are:

1. utf8_unicode_ci supports so called expansions and ligatures, for example:
German letter ß (U+00DF LETTER SHARP S) is sorted near "ss"
Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_general_ci does not support expansions/ligatures, it sorts
all these letters as single characters, and sometimes in a wrong order.

2. utf8_unicode_ci is *generally* more accurate for all scripts.
For example, on Cyrillic block:
utf8_unicode_ci is fine for all these languages:
Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian.
While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic.
Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian
are sorted not well.

The disadvantage of utf8_unicode_ci is that it is a little bit
slower than utf8_general_ci.

So when you need better sorting order - use utf8_unicode_ci,
and when you utterly interested in performance - use utf8_general_ci.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

utf8_unicode_ci vs utf8_general_ci

59538

daniel achim

December 07, 2007 12:47AM

Re: utf8_unicode_ci vs utf8_general_ci

137862

Alexander Barkov

December 18, 2007 04:03PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.