Re: UTF8 characters substituted in queries?
Posted by: Dave Varon
Date: September 15, 2005 06:37PM

Hi again,

Thanks for your suggestions. I've tried a number of different utf8-based collations with only moderate progress: utf_unicode_ci, utf8_bin, utf8_roman_ci, etc.

These are my current settings:
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-standard-4.1.9-apple-darwin7.7.0-powerpc/share/mysql/charsets/ |
| collation_connection | utf8_bin |
| collation_database | utf8_bin |
| collation_server | utf8_bin
...
| init_connect | SET NAMES utf8 |

init_connect doesn't seem to have an effect:

mysql> select * from party where name_human_first = 'Rub\303\251n' collate utf8_unicode_ci;
ERROR 1253 (42000): COLLATION 'utf8_unicode_ci' is not valid for CHARACTER SET 'latin1'
mysql>



After 'SET NAMES utf8', the charset settings are consistent:

mysql> select name_human_first from party where name_human_first = 'Rub\303\251n' collate utf8_unicode_ci;
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.12 sec)

mysql> select name_human_first from party where name_human_first = 'Ruben' collate utf8_unicode_ci;
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.09 sec)



However, only the utf8_bin collation provides the expected results:

mysql> select name_human_first from party where name_human_first = 'Rub\303\251n' collate utf8_bin;
+------------------+
| name_human_first |
+------------------+
| Rubén |
+------------------+
1 row in set (0.01 sec)

mysql> select name_human_first from party where name_human_first = 'Ruben' collate utf8_bin;
+------------------+
| name_human_first |
+------------------+
| Ruben |
+------------------+
1 row in set (0.00 sec)

...but only when specified on the CLI as above. Without it, as indicated below, the characters are still treated as equivalent:

mysql> select name_human_first from party where name_human_first = 'Rub\303\251n';
+------------------+
| name_human_first |
+------------------+
| Rubén |
| Ruben |
+------------------+
2 rows in set (0.00 sec)

So I'm guessing there's another way to set the client or connection collation that I've overlooked. Right now the database, connection, server, client, table, column, and results are all set to utf8/utf8_bin.

What am I missing, and will it apply to the webapp too?

Options: ReplyQuote


Subject
Written By
Posted
Re: UTF8 characters substituted in queries?
September 15, 2005 06:37PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.