MySQL Forums
Forum List  »  Ruby

MySql+UTF8 woes
Posted by: Ronald Fischer
Date: July 27, 2007 01:48AM

(this article was also posted on Ruby-Talk)

I am experiencing a strange problem when retrieving Unicode values from
my MySql5 database on Windows. Here is what I have:

* All programs/modules in my application require in the beginning a file containing

$KCODE='u'
require 'jcode'

in the beginning, ensuring that "everything is done in Unicode".

* I checked that the MySql database is "as UTF-8 as possible", in
particular
- in the table editor of the MySql Administrator, the character set
for the rows is listed as utf8 (with sorting order utf8_general_ci)
- The "start variables" tab of the Administrator says utf8 for the default
character set
- mysql.ini says
default-character-set=utf8
- On the mysql command line, I see

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+


Indeed, the following works well:

When I store (from Ruby, via 'mysql' gem) a UTF-8 string
into the database, and later retrieve it, I get back exactly
what I had put in. This works equally well, whether I type
(using my UTF-8 enabled text editor) the character (for
example, a German umlaut) directly into the Ruby string,
which I then store, i.e. (table X having only one column)

sqlstatement="INSERT INTO X VALUES ('ö')"
dbhandle.query(sqlstatement)

or whether I write the code point in hexadecimal:

uml_o="\xc3\xb6" # ö, i.e. German umlaut-o in UTF-8 encoding
sqlstatement="INSERT INTO X VALUES ('#{uml_o}')"
dbhandle.query(sqlstatement)

In both cases, when I retrieve the rows with
dbhandle.query("SELECT * FROM X")
I eventually get back the row with my "ö" as a UTF-8
2-Byte-sequence C3 B6.

So far OK. But there are several things which do not work:

(1) When I look at the database table using the SQL Administrator,
I do not see the letter 'ö', but instead the characters corresponding
to the Windows ANSI code sequence for C3 and B6 characters.

(2) When I manually enter data into the database using MySql
Administrator, by submitting the SQL statement
INSERT INTO X VALUES ('ö')
the ö is displayed fine when looking at the table with a SELECT
using the Administrator, but when I read the table from Ruby, I do
not get back a 2 byte UTF-8 sequence for the umlaut, but only one
byte with hex code F6. Note that this F6 corresponds to the ANSI
code for ö in Windows.

(3) When I put the unicode data into a file, and use a Java application
to read the file and write the data into the database (using JDBC), and
I look at the database then with MySql Administrator, I can see the
German characters; and when I then read them back with my Ruby
application, I get it encoded in Windows ANSI, not Unicode.

Observations (1) and (2) could be explained in the way that maybe the
MySql administrator interface was not written with UTF-8 in minde. What
surprises me more is that when writing data using Java (which is inherently
Unicode aware) and reading it back with Ruby, causes somewhere a conversion
from UTF-8 into Windows ANSI.

Does someone have experienced a similar problem and can point out a possible
reason for it?

--
Ronald Fischer <ronaldf@eml.cc>
(Do not send email with content-type text/html to this address)

Options: ReplyQuote


Subject
Written By
Posted
MySql+UTF8 woes
July 27, 2007 01:48AM
October 30, 2007 09:28AM
November 17, 2008 05:16AM
November 17, 2008 06:45AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.