Re: searches on utf8_unicode_ci only work for ASCII
Posted by: Rick James
Date: July 13, 2010 08:57AM

ñ is treated as an accented letter in some collations. However, utf8_spanish_ci seems to treat it "correctly":

mysql> SELECT hex_utf8, name FROM unicode WHERE unhex(hex_utf8) = _utf8'n' collate utf8_general_ci;
+----------+----------------------------------------------+
| hex_utf8 | name                                         |
+----------+----------------------------------------------+
| 4E       | LATIN CAPITAL LETTER N                       |
| 6E       | LATIN SMALL LETTER N                         |
| C391     | LATIN CAPITAL LETTER N WITH TILDE            |
| C3B1     | LATIN SMALL LETTER N WITH TILDE              |
| C583     | LATIN CAPITAL LETTER N WITH ACUTE            |
| C584     | LATIN SMALL LETTER N WITH ACUTE              |
| C585     | LATIN CAPITAL LETTER N WITH CEDILLA          |
| C586     | LATIN SMALL LETTER N WITH CEDILLA            |
| C587     | LATIN CAPITAL LETTER N WITH CARON            |
| C588     | LATIN SMALL LETTER N WITH CARON              |
| C7B8     | LATIN CAPITAL LETTER N WITH GRAVE            |
| C7B9     | LATIN SMALL LETTER N WITH GRAVE              |
| E1B984   | LATIN CAPITAL LETTER N WITH DOT ABOVE        |
| E1B985   | LATIN SMALL LETTER N WITH DOT ABOVE          |
| E1B986   | LATIN CAPITAL LETTER N WITH DOT BELOW        |
| E1B987   | LATIN SMALL LETTER N WITH DOT BELOW          |
| E1B988   | LATIN CAPITAL LETTER N WITH LINE BELOW       |
| E1B989   | LATIN SMALL LETTER N WITH LINE BELOW         |
| E1B98A   | LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW |
| E1B98B   | LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW   |
+----------+----------------------------------------------+
20 rows in set (0.02 sec)

mysql> SELECT hex_utf8, name FROM unicode WHERE unhex(hex_utf8) = _utf8'n' collate utf8_unicode_ci;
+----------+----------------------------------------------+
| hex_utf8 | name                                         |
+----------+----------------------------------------------+
| 4E       | LATIN CAPITAL LETTER N                       |
| 6E       | LATIN SMALL LETTER N                         |
| C391     | LATIN CAPITAL LETTER N WITH TILDE            |
| C3B1     | LATIN SMALL LETTER N WITH TILDE              |
| C583     | LATIN CAPITAL LETTER N WITH ACUTE            |
| C584     | LATIN SMALL LETTER N WITH ACUTE              |
| C585     | LATIN CAPITAL LETTER N WITH CEDILLA          |
| C586     | LATIN SMALL LETTER N WITH CEDILLA            |
| C587     | LATIN CAPITAL LETTER N WITH CARON            |
| C588     | LATIN SMALL LETTER N WITH CARON              |
| C7B8     | LATIN CAPITAL LETTER N WITH GRAVE            |
| C7B9     | LATIN SMALL LETTER N WITH GRAVE              |
| E1B4BA   | MODIFIER LETTER CAPITAL N                    |
| E1B984   | LATIN CAPITAL LETTER N WITH DOT ABOVE        |
| E1B985   | LATIN SMALL LETTER N WITH DOT ABOVE          |
| E1B986   | LATIN CAPITAL LETTER N WITH DOT BELOW        |
| E1B987   | LATIN SMALL LETTER N WITH DOT BELOW          |
| E1B988   | LATIN CAPITAL LETTER N WITH LINE BELOW       |
| E1B989   | LATIN SMALL LETTER N WITH LINE BELOW         |
| E1B98A   | LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW |
| E1B98B   | LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW   |
| E281BF   | SUPERSCRIPT LATIN SMALL LETTER N             |
| E28495   | DOUBLE-STRUCK CAPITAL N                      |
| E29383   | CIRCLED LATIN CAPITAL LETTER N               |
| E2939D   | CIRCLED LATIN SMALL LETTER N                 |
| EFBCAE   | FULLWIDTH LATIN CAPITAL LETTER N             |
| EFBD8E   | FULLWIDTH LATIN SMALL LETTER N               |
+----------+----------------------------------------------+
27 rows in set (0.02 sec)

mysql> SELECT hex_utf8, name FROM unicode WHERE unhex(hex_utf8) = _utf8'n' collate utf8_bin;
+----------+----------------------+
| hex_utf8 | name                 |
+----------+----------------------+
| 6E       | LATIN SMALL LETTER N |
+----------+----------------------+
1 row in set (0.04 sec)

mysql> SELECT hex_utf8, name FROM unicode WHERE unhex(hex_utf8) = _utf8'n' collate utf8_spanish_ci;
+----------+----------------------------------------------+
| hex_utf8 | name                                         |
+----------+----------------------------------------------+
| 4E       | LATIN CAPITAL LETTER N                       |
| 6E       | LATIN SMALL LETTER N                         |
| C583     | LATIN CAPITAL LETTER N WITH ACUTE            |
| C584     | LATIN SMALL LETTER N WITH ACUTE              |
| C585     | LATIN CAPITAL LETTER N WITH CEDILLA          |
| C586     | LATIN SMALL LETTER N WITH CEDILLA            |
| C587     | LATIN CAPITAL LETTER N WITH CARON            |
| C588     | LATIN SMALL LETTER N WITH CARON              |
| C7B8     | LATIN CAPITAL LETTER N WITH GRAVE            |
| C7B9     | LATIN SMALL LETTER N WITH GRAVE              |
| E1B4BA   | MODIFIER LETTER CAPITAL N                    |
| E1B984   | LATIN CAPITAL LETTER N WITH DOT ABOVE        |
| E1B985   | LATIN SMALL LETTER N WITH DOT ABOVE          |
| E1B986   | LATIN CAPITAL LETTER N WITH DOT BELOW        |
| E1B987   | LATIN SMALL LETTER N WITH DOT BELOW          |
| E1B988   | LATIN CAPITAL LETTER N WITH LINE BELOW       |
| E1B989   | LATIN SMALL LETTER N WITH LINE BELOW         |
| E1B98A   | LATIN CAPITAL LETTER N WITH CIRCUMFLEX BELOW |
| E1B98B   | LATIN SMALL LETTER N WITH CIRCUMFLEX BELOW   |
| E281BF   | SUPERSCRIPT LATIN SMALL LETTER N             |
| E28495   | DOUBLE-STRUCK CAPITAL N                      |
| E29383   | CIRCLED LATIN CAPITAL LETTER N               |
| E2939D   | CIRCLED LATIN SMALL LETTER N                 |
| EFBCAE   | FULLWIDTH LATIN CAPITAL LETTER N             |
| EFBD8E   | FULLWIDTH LATIN SMALL LETTER N               |
+----------+----------------------------------------------+
25 rows in set (0.10 sec)

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: searches on utf8_unicode_ci only work for ASCII
2334
July 13, 2010 08:57AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.