Re: Proper display of special latin1 characters in Powershell / MySql client ?
Posted by: Rick James
Date: March 31, 2011 10:47PM

It is always confusing as to who is converting what bytes in what way.

mysql> SET NAMES latin1;
mysql> select _utf8 x'C3A9' as col1, char_length(_utf8 x'C3A9') as col2 ;
+------+------+
| col1 | col2 |
+------+------+
| é | 1 |
+------+------+
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | c:\wamp\bin\mysql\mysql5.1.30\share\charsets\ |
+--------------------------+-----------------------------------------------+
#######
mysql> SET NAMES utf8;
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | c:\wamp\bin\mysql\mysql5.1.30\share\charsets\ |
+--------------------------+-----------------------------------------------+
mysql> select _utf8 x'C3A9' as col1, char_length(_utf8 x'C3A9') as col2 ;
+------+------+
| col1 | col2 |
+------+------+
| Ac | 1 |
+------+------+
[/code]

Col2: Notice how char_length() was 1 in both cases. That is because the server saw the utf8 'character', and the client never touched the character.

Col1: _utf8 x'C3A9' as col1 -- This creates a utf8 character entirely in the server. Again the client does not touch the character. That is, not until you try to read it back. If I say "I am using latin1 in my client":
character_set_results | latin1
then the 2-byte utf8 char is converted to latin1 as I receive it.
OTOH, with
character_set_results | utf8
the SELECT trusts that my client can properly display utf8. But it cannot, so the two bytes are treated as 2 characters "Ac".

If my client could understand and display utf8, then the first case (falsely claiming to be in latin1) would display nothing because the conversion to latin1 would create an invalid utf8 code. The second case (correctly claiming utf8) would correct show the e-acute.

But that does not help diagnose what is in your table. For that, _utf8'...' is not useful, instead the column name is useful.

I am using the mysql cli on Vista. It has effectively the latin1 codepage, and I cannot figure how to get the utf8 page.
As you noted, different shells, and different UIs lead to different results.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.