character_set problem, cannot use utf8 properly to store Chinese character
Posted by: Roger Ye
Date: December 13, 2004 12:35PM

Hi,

I'm using mysql-4.1.7-win via JDBC driver mysql-connector-java-3.0.16-ga.zip,
os win2k en sp4, locale/language is chinese(PRC), now I've encountered an urgent character_set problem.

This is rather long, thanks for your patience!

Let me show the scenario:


In MySQL configuration file C:\WINNT\my.ini, I've set

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[mysqld]
basedir=C:/mysql-4.1.7-win
bind-address=localhost
#
character_set_server=utf8
datadir=C:/mysql-4.1.7-win/data
#language=C:/mysql-4.1.7-win/share/your language directory
#slow query log#=
#tmpdir#=
port=3306
#set-variable=key_buffer=16M
[WinMySQLadmin]
Server=C:/mysql-4.1.7-win/bin/mysqld-nt.exe
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

also from inside WinMySQLadmin, inside the Variables tab
I can see value of the character_set_server is 'utf8',
and the encoding of my database/schema(vp) is also 'utf8'.


Here is my Java code

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
public class TestMySQL {

public static void main(String[] args) throws Exception {
Class.forName("com.mysql.jdbc.Driver");
Connection conn = DriverManager.getConnection(
"jdbc:mysql://localhost:3306/vp?useUnicode=true&characterEncoding=utf8",
"root", "mypwd");

Statement stmt = conn.createStatement();

stmt.executeUpdate("SET NAMES 'utf8'"); // just to make sure
stmt.executeUpdate("SET character_set_results = NULL");

String input = "\u5317\u4eac"; // the 2 Chinese character for "Beijing",
// capital of China
// Character "bei", refer to
// http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=5317
// Character "jing", refer to
// http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=4eac

String delSql = "DELETE FROM vp.ica_dcc WHERE branch_no > 3";
String insSql = "INSERT INTO vp.ica_dcc ( branch_no, city ) VALUES ('9', '"
+ input + "')";

stmt.executeUpdate(delSql);
stmt.executeUpdate(insSql);

String qrySql = "select * from vp.ica_dcc";

ResultSet rset = stmt.executeQuery(qrySql);
while (rset.next()) {
String city = rset.getString("city");
System.out.println(city);
}

conn.close();
}
}
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

What's the result? it's "??", yes, just two question marks!

Iin the connection uri, I have set useUnicode=true & characterEncoding=utf8,
and later, I explicitly call "SET NAMES 'utf8'" again,
and also "SET character_set_results = NULL".


And here is the table schema (I got from SQLyag).

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
/*DDL Information For - vp.ica_dcc*/
------------------------------------

Table Create Table
------- ------------------------------------------------------------------
ica_dcc CREATE TABLE `ica_dcc` (
`branch_no` char(9) character set latin1 NOT NULL default '',
`ica_no` char(6) character set latin1 default NULL,
`dcc_id` char(1) character set latin1 default NULL,
`member_name` char(50) character set latin1 default NULL,
`city` char(50) character set latin1 default NULL,
`state_code` char(10) character set latin1 default NULL,
`state` char(50) character set latin1 default NULL,
`bank` char(50) character set latin1 default NULL,
`fax` char(20) character set latin1 default NULL,
`telephone` char(20) character set latin1 default NULL,
`region` char(20) character set latin1 default NULL,
`update_time` date default NULL,
PRIMARY KEY (`branch_no`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Here the DEFAULT CHARSET of table vp.ica_dcc is also 'utf8'


I've read the manual with my best carefulness, and got these info

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
In section: 11.3.6 Connection Character Sets and Collations

....

SET NAMES indicates what is in the SQL statements that the client sends.
Thus, SET NAMES 'cp1251' tells the server ``future incoming messages from
this client will be in character set cp1251.'' It also specifies the
character set for results that the server sends back to the client.
(For example, it indicates what character set column values will have
if you use a SELECT statement.)

A SET NAMES 'x' statement is equivalent to these three statements:

mysql> SET character_set_client = x;
mysql> SET character_set_results = x;
mysql> SET character_set_connection = x;

....

When a client connects, it sends to the server the name of the character
set that it wants to use. The server sets the character_set_client,
character_set_results, and character_set_connection variables to that
character set. (In effect, the server performs a SET NAMES operation using
the character set.)

....

If you do not want the server to perform any conversion, set character_set_results to NULL:

mysql> SET character_set_results = NULL;

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


In fact, after debug that program, I find
stmt.charEncoding = "utf8"
conn.characterSetMetadata = "UTF-8"
conn.characterSetResults = null
conn.characterSetResultsOnServer= = null
conn.charsetConverterMap = {utf8=null, UTF-8=null, US-ASCII=com.mysql.jdbc.SingleByteCharsetConverter@15a0305}
conn.encoding = "utf8"


these values never change during the program executing,


I've tried to solve the character set problem with the info from the manual
but still can not get the correct result, so, two questions:

1. Can I know the character_set_client/connection/results of the current connection? And how?
are these settings stick to per connection?

2. How to solve my problem? is it a bug of the driver or mysql itself?

Regards
Roger

Options: ReplyQuote


Subject
Views
Written By
Posted
character_set problem, cannot use utf8 properly to store Chinese character
13456
December 13, 2004 12:35PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.