UTF-8 database, Windows mysql.exe and INSERT/SELECT troubles. SET NAMES latin1 doesn't translate to latin1
Posted by: J Maian
Date: March 16, 2010 06:00PM

Edit: I fixed this by setting 'names' to cp850, the code page being used by cmd.exe. I thought cp850 and latin1 were the same character encoding: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true shows cp850 as being latin1 but http://www.jwz.org/doc/charsets.html says cp850 (which is equivalent to cp1252 as far as I can tell) is infact a superset of latin1. Basically doing the following solved my problems:
SET NAMES cp850;

I've been trying to wrap my head around UTF-8 character encoding under Windows with MySQL 5.1 for the last few days. After lots and lots of searching I've come to the conclusion that:

* cmd.exe doesn't support UTF-8
* mysql.exe, therefore, doesn't either
* setting the client's 'names' to latin1 so that the server translates input to UTF-8 and output to something cmd.exe can represent correctly doesn't work

This last point is what I understand the 'names' variables should do, after reading lots of different pages. Especially MySQL's docs: http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html

My experience with inserting data with 'SET NAMES utf8' is the following.
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO clients VALUES("Núñez");
Query OK, 1 row affected, 2 warnings (0.00 sec)

mysql> SELECT lastname FROM clients;
+---------------------+
| lastname            |
+---------------------+
| N                   |
+---------------------+
1 row in set (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------+
| Level   | Code | Message                                                               |
+---------+------+-----------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xA3\xA4ez' for column 'lastname' at row 1   |
+---------+------+-----------------------------------------------------------------------+

1 row in set (0.00 sec)

The warnings show that what the server received for characters 'ú' and 'ñ' were in fact extended ASCII values 163 and 164, i.e. no conversion to UTF-8.

With SET NAMES latin1 what gets inserted in place of the 'ñ' is \xC2\xA4. C2 is decimal 194 and a random character in ascii. A4 is decimal 164 and 'ñ' in extended ASCII. No conversion to UTF-8 either.

When trying to query a value that has been inserted as UTF-8 correctly (via MySQL Workbench) with names set to utf this is what I receive:
mysql> SELECT lastname FROM clients;
+---------------------+
| lastname            |
+---------------------+
| N├║├▒ez             |
+---------------------+
1 row in set (0.00 sec)
I later found out that cmd.exe doesn't support UTF-8 and that is why it's garbled. The characters that appear are just the escape sequence values of UTF-8 (\xC3\xB1 for 'ñ') interpreted as ASCII.

When names are set to latin1 and querying the same correct UTF-8 value this is the result:
mysql> select lastname from clients;
+----------+
| lastname |
+----------+
| N·±ez    |
+----------+
1 row in set (0.00 sec)
The server has truncated each multibyte character to the first byte, from what I was told in the IRC channel.

From what I've understood reading the docs, though, setting mysql.exe's names to latin1 should result in the server converting INSERTs from the client to UTF-8. Conversely, any query results should be converted from UTF-8 to latin1 for correct display on the client. In Python I can convert an ASCII string to UTF-8 and do the opposite too: UTF-8 to ASCII.

Have I misunderstood the docs? What is the purpose of the 'names' variables if I have misunderstood?



Edited 1 time(s). Last edit at 03/17/2010 06:19AM by J Maian.

Options: ReplyQuote


Subject
Views
Written By
Posted
UTF-8 database, Windows mysql.exe and INSERT/SELECT troubles. SET NAMES latin1 doesn't translate to latin1
11989
March 16, 2010 06:00PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.