Struggling to understand the character sets
Posted by: Clay Morgan
Date: October 15, 2009 02:31AM

I've recently starting the process to support multiple languages on a legacy web application. It's a servlet application in Tomcat - I don't think that's important yet. What is important is that the XML and relevant output uses UTF-8.

From everything I've read about character sets, I should be using utf8 everywhere in MySql, which I have tried to do, without success. I have only had success in storing French characters using latin1, but after that point Connector/J and some XML components bungle the encoding somewhere along the line, so I'm trying to get everything into utf8 in MySql, so that I know my environment is utf8 before continuing.

Please can someone explain what it is that I am missing that is causing my utf8 approach to not work. As you can see from this output only the latin1 approach gives me any success while the utf8 approach gives me the infamous Incorrect String error. I believe that once I can get the utf8 approach to work I can then address my problem with communicating with Connector/J; I think I remember reading it does not support "SET NAMES" but I am thinking I should research that and cross that bridge once I am in utf8 in MySQL as many people seem to suggest. I would really appreciate some insight into my problem.

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> set character set utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

mysql> create table test ( test_desc varchar(50) ) default charset=utf8;
Query OK, 0 rows affected (0.06 sec)

mysql> insert into test( test_desc ) values ('français');
ERROR 1366 (HY000): Incorrect string value: '\x87ais' for column 'test_desc' at row 1
mysql> set character set latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\MySQL\MySQL Server 5.1\share\charsets\ |
+--------------------------+-------------------------------------------+
8 rows in set (0.00 sec)

mysql> create table test1 (test_desc varchar(50) ) default charset=latin1;
Query OK, 0 rows affected (0.05 sec)

mysql> insert into test1(test_desc) values ('français');
Query OK, 1 row affected (0.02 sec)

mysql> select test_desc from test1;
+-----------+
| test_desc |
+-----------+
| français |
+-----------+
1 row in set (0.00 sec)

Options: ReplyQuote


Subject
Views
Written By
Posted
Struggling to understand the character sets
3416
October 15, 2009 02:31AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.