UTF8 Chinese String Comparison
Posted by: CL Chuah
Date: September 25, 2009 01:38PM

Hi all,
I have this table

create table test_utf (
a varchar (255) COLLATE utf8_general_ci
);

Then I have this two records,
insert into test_utf (a) values ('飞');
insert into test_utf (a) values ('裎');

So, when i retrieve the records,
select * from test_utf where a='裎';
> The server returns me both '飞' & '裎'.

If I change it to
select * from test_utf where a=_utf8'裎';
> Server returns nil

If I force collation
select * from test_utf where a='飞' COLLATE utf8_general_ci;
> Error Code : 1253
> COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

select * from test_utf where a='飞' COLLATE latin1_swedish_ci;
> Error Code : 1267
> Illegal mix of collations (utf8_general_ci,IMPLICIT) and (latin1_swedish_ci,EXPLICIT) for operation '='

It puzzles me, so I did a simple test
select CONVERT('飞' USING latin1)=CONVERT('裎' USING latin1); -- got 0
select CONVERT('飞' USING utf8)=CONVERT('裎' USING utf8); -- got 1
select CONVERT('飞' USING gb2312)=CONVERT('裎' USING gb2312); -- got 0
select CONVERT('é' USING utf8)=CONVERT('è' USING utf8); -- got 1

So seems like i gotta to do this
select * from test_utf where CONVERT(a USING gb2312)=CONVERT('飞' USING gb2312);
> in order to get my correct result, and UTF8 comparison treat é == è

I suspect it's something to do with collation/ case-sensitivity.
Can someone explain to me:
1) anything wrong with my understanding of UTF8?
2) the right way to store Chinese char into MySQL

P/S: 飞(飞) 裎(裎) is two different words in Chinese, fyi

Thanks!!

Options: ReplyQuote


Subject
Views
Written By
Posted
UTF8 Chinese String Comparison
10544
September 25, 2009 01:38PM
3670
September 26, 2009 05:48PM
3313
September 27, 2009 12:37PM
4546
September 27, 2009 05:44PM
3571
September 28, 2009 03:41PM
3810
September 28, 2009 03:49PM
3291
September 28, 2009 11:41PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.