Re: collation in mysql
Posted by: Rick James
Date: May 14, 2011 01:05PM

"Character set" and "collation" are related, but different.

For storing International languages, the utf8 character set is recommended.

Once you have picked a character set, then you pick a collation that works with that characterset. See below. Note that latin1 does not handle all the Polish characters.

mysql> SHOW COLLATION WHERE Charset = 'latin1';
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1  |  5 |         | Yes      |       1 |
| latin1_swedish_ci | latin1  |  8 | Yes     | Yes      |       1 |
| latin1_danish_ci  | latin1  | 15 |         | Yes      |       1 |
| latin1_german2_ci | latin1  | 31 |         | Yes      |       2 |
| latin1_bin        | latin1  | 47 |         | Yes      |       1 |
| latin1_general_ci | latin1  | 48 |         | Yes      |       1 |
| latin1_general_cs | latin1  | 49 |         | Yes      |       1 |
| latin1_spanish_ci | latin1  | 94 |         | Yes      |       1 |
+-------------------+---------+----+---------+----------+---------+
8 rows in set (0.00 sec)

mysql> SHOW COLLATION WHERE Collation LIKE '%polish%';
+------------------+---------+-----+---------+----------+---------+
| Collation        | Charset | Id  | Default | Compiled | Sortlen |
+------------------+---------+-----+---------+----------+---------+
| cp1250_polish_ci | cp1250  |  99 |         | Yes      |       1 |
| utf8_polish_ci   | utf8    | 197 |         | Yes      |       8 |
| ucs2_polish_ci   | ucs2    | 133 |         | Yes      |       8 |
+------------------+---------+-----+---------+----------+---------+
3 rows in set (0.00 sec)

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------+---------+-----+---------+----------+---------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_general_ci    | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin           | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci    | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci  | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci    | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci   | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci  | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci     | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci   | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci    | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci    | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci    | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci      | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci     | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci     | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci   | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci      | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci    | utf8    | 208 |         | Yes      |       8 |
| utf8_esperanto_ci  | utf8    | 209 |         | Yes      |       8 |
| utf8_hungarian_ci  | utf8    | 210 |         | Yes      |       8 |
+--------------------+---------+-----+---------+----------+---------+
21 rows in set (0.00 sec)

No collation will work perfectly for multiple languages; there are too many inconsistencies. utf8_unicode_ci may be the best for a mixture. utf8_polish_ci (etc) may be 'better' if you know that all your text is Polish (etc).

Note that "collation" is not relevant until you compare or sort strings -- PRIMARY KEY compares for catching duplicates; WHERE with '>' compares; ORDER BY sorts; etc. For simple storage, collation may not matter.

For further discussion, especially if you need to convert existing latin1 tables to utf8:
http://mysql.rjweb.org/doc.php/charcoll

Options: ReplyQuote


Subject
Views
Written By
Posted
4583
May 13, 2011 04:56AM
Re: collation in mysql
3966
May 14, 2011 01:05PM
5621
July 04, 2011 10:04AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.