Skip navigation links

MySQL Forums :: Character Sets, Collation, Unicode :: utf8_unicode_ci vs utf8_general_ci


Advanced Search

Re: utf8_unicode_ci vs utf8_general_ci
Posted by: Alexander Barkov ()
Date: December 18, 2007 04:03PM

Hi,

You can check and compare sort orders provided by these two collations here:

http://www.collation-charts.org/mysql60/mysql604.utf8_general_ci.european.html
http://www.collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

utf8_general_ci is a very simple collation. What it does - it just
- removes all accents
- then converts to upper case
and uses the code of this sort of "base letter" result letter to compare.

For example, these Latin letters: ÀÁÅåāă (and all other Latin letters "a"
with any accents and in any cases) are all compared as equal to "A".


utf8_unicode_ci uses the default Unicode collation element table (DUCET).


The main differences are:

1. utf8_unicode_ci supports so called expansions and ligatures, for example:
German letter ß (U+00DF LETTER SHARP S) is sorted near "ss"
Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_general_ci does not support expansions/ligatures, it sorts
all these letters as single characters, and sometimes in a wrong order.

2. utf8_unicode_ci is *generally* more accurate for all scripts.
For example, on Cyrillic block:
utf8_unicode_ci is fine for all these languages:
Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian.
While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic.
Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian
are sorted not well.


The disadvantage of utf8_unicode_ci is that it is a little bit
slower than utf8_general_ci.

So when you need better sorting order - use utf8_unicode_ci,
and when you utterly interested in performance - use utf8_general_ci.

Options: ReplyQuote


Subject Views Written By Posted
utf8_unicode_ci vs utf8_general_ci 22544 daniel achim 12/07/2007 12:47AM
Re: utf8_unicode_ci vs utf8_general_ci 25677 Alexander Barkov 12/18/2007 04:03PM


Sorry, you can't reply to this topic. It has been closed.