Re: UTF8 characters substituted in queries?
Posted by:
Dave Varon
Date: September 15, 2005 06:37PM
Hi again,
Thanks for your suggestions. I've tried a number of different utf8-based collations with only moderate progress: utf_unicode_ci, utf8_bin, utf8_roman_ci, etc.
These are my current settings:
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-standard-4.1.9-apple-darwin7.7.0-powerpc/share/mysql/charsets/ |
| collation_connection | utf8_bin |
| collation_database | utf8_bin |
| collation_server | utf8_bin
...
| init_connect | SET NAMES utf8 |
init_connect doesn't seem to have an effect:
mysql> select * from party where name_human_first = 'Rub\303\251n' collate utf8_unicode_ci;
ERROR 1253 (42000): COLLATION 'utf8_unicode_ci' is not valid for CHARACTER SET 'latin1'
mysql>
After 'SET NAMES utf8', the charset settings are consistent:
mysql> select name_human_first from party where name_human_first = 'Rub\303\251n' collate utf8_unicode_ci;
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.12 sec)
mysql> select name_human_first from party where name_human_first = 'Ruben' collate utf8_unicode_ci;
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.09 sec)
However, only the utf8_bin collation provides the expected results:
mysql> select name_human_first from party where name_human_first = 'Rub\303\251n' collate utf8_bin;
+------------------+
| name_human_first |
+------------------+
| Rubén |
+------------------+
1 row in set (0.01 sec)
mysql> select name_human_first from party where name_human_first = 'Ruben' collate utf8_bin;
+------------------+
| name_human_first |
+------------------+
| Ruben |
+------------------+
1 row in set (0.00 sec)
...but only when specified on the CLI as above. Without it, as indicated below, the characters are still treated as equivalent:
mysql> select name_human_first from party where name_human_first = 'Rub\303\251n';
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.00 sec)
So I'm guessing there's another way to set the client or connection collation that I've overlooked. Right now the database, connection, server, client, table, column, and results are all set to utf8/utf8_bin.
What am I missing, and will it apply to the webapp too?