UTF8 Chinese String Comparison
Posted by:
CL Chuah
Date: September 25, 2009 01:38PM
Hi all,
I have this table
create table test_utf (
a varchar (255) COLLATE utf8_general_ci
);
Then I have this two records,
insert into test_utf (a) values ('飞');
insert into test_utf (a) values ('裎');
So, when i retrieve the records,
select * from test_utf where a='裎';
> The server returns me both '飞' & '裎'.
If I change it to
select * from test_utf where a=_utf8'裎';
> Server returns nil
If I force collation
select * from test_utf where a='飞' COLLATE utf8_general_ci;
> Error Code : 1253
> COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'
select * from test_utf where a='飞' COLLATE latin1_swedish_ci;
> Error Code : 1267
> Illegal mix of collations (utf8_general_ci,IMPLICIT) and (latin1_swedish_ci,EXPLICIT) for operation '='
It puzzles me, so I did a simple test
select CONVERT('飞' USING latin1)=CONVERT('裎' USING latin1); -- got 0
select CONVERT('飞' USING utf8)=CONVERT('裎' USING utf8); -- got 1
select CONVERT('飞' USING gb2312)=CONVERT('裎' USING gb2312); -- got 0
select CONVERT('é' USING utf8)=CONVERT('è' USING utf8); -- got 1
So seems like i gotta to do this
select * from test_utf where CONVERT(a USING gb2312)=CONVERT('飞' USING gb2312);
> in order to get my correct result, and UTF8 comparison treat é == è
I suspect it's something to do with collation/ case-sensitivity.
Can someone explain to me:
1) anything wrong with my understanding of UTF8?
2) the right way to store Chinese char into MySQL
P/S: 飞(飞) 裎(裎) is two different words in Chinese, fyi
Thanks!!
Subject
Views
Written By
Posted
UTF8 Chinese String Comparison
10468
September 25, 2009 01:38PM
3638
September 26, 2009 05:48PM
3290
September 27, 2009 12:37PM
4519
September 27, 2009 05:44PM
3547
September 28, 2009 03:41PM
3773
September 28, 2009 03:49PM
3270
September 28, 2009 11:41PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.