MySQL Forums :: Character Sets, Collation, Unicode :: UTF8 Chinese String Comparison


Advanced Search

UTF8 Chinese String Comparison
Posted by: CL Chuah ()
Date: September 25, 2009 01:38PM

Hi all,
I have this table

create table test_utf (
a varchar (255) COLLATE utf8_general_ci
);

Then I have this two records,
insert into test_utf (a) values ('飞');
insert into test_utf (a) values ('裎');

So, when i retrieve the records,
select * from test_utf where a='裎';
> The server returns me both '飞' & '裎'.

If I change it to
select * from test_utf where a=_utf8'裎';
> Server returns nil

If I force collation
select * from test_utf where a='飞' COLLATE utf8_general_ci;
> Error Code : 1253
> COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

select * from test_utf where a='飞' COLLATE latin1_swedish_ci;
> Error Code : 1267
> Illegal mix of collations (utf8_general_ci,IMPLICIT) and (latin1_swedish_ci,EXPLICIT) for operation '='

It puzzles me, so I did a simple test
select CONVERT('飞' USING latin1)=CONVERT('裎' USING latin1); -- got 0
select CONVERT('飞' USING utf8)=CONVERT('裎' USING utf8); -- got 1
select CONVERT('飞' USING gb2312)=CONVERT('裎' USING gb2312); -- got 0
select CONVERT('é' USING utf8)=CONVERT('è' USING utf8); -- got 1

So seems like i gotta to do this
select * from test_utf where CONVERT(a USING gb2312)=CONVERT('飞' USING gb2312);
> in order to get my correct result, and UTF8 comparison treat é == è

I suspect it's something to do with collation/ case-sensitivity.
Can someone explain to me:
1) anything wrong with my understanding of UTF8?
2) the right way to store Chinese char into MySQL

P/S: 飞(飞) 裎(裎) is two different words in Chinese, fyi

Thanks!!

Options: ReplyQuote


Subject Views Written By Posted
UTF8 Chinese String Comparison 8245 CL Chuah 09/25/2009 01:38PM
Re: UTF8 Chinese String Comparison 2971 Rick James 09/26/2009 05:48PM
Re: UTF8 Chinese String Comparison 2679 CL Chuah 09/27/2009 12:37PM
Re: UTF8 Chinese String Comparison 3848 Rick James 09/27/2009 05:44PM
Re: UTF8 Chinese String Comparison 2913 CL Chuah 09/28/2009 03:41PM
Re: UTF8 Chinese String Comparison 3073 CL Chuah 09/28/2009 03:49PM
Re: UTF8 Chinese String Comparison 2596 Rick James 09/28/2009 11:41PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.