Re: Character encodings issues with php, mysql, apache
Posted by: Rick James
Date: March 31, 2009 07:46PM

"El 31 de enero revelarán ganadores de 49º Premio Casa de las Américas"
 000000000111111111122222222223333333333444444444455555555556666666666
 123456789-123456789-123456789-123456789-123456789-123456789-123456789
title : El 31 de enero revelarán ganadores de 49º Premio Casa de las Américas
LENGTH(title) : 78
CHAR_LENGTH(title) : 72
HEX(title) : 456C20  El
333120  31
646520  de
656E65726F20  enero
726576656C6172C383C2A16E20  reve...
 r e v e l a r - - - - n
67616E61646F72657320  ganadores
646520  de
3439C382C2BA20  49...
 4 9 - - - -
5072656D696F20  Premio
4361736120   Casa
646520  de
6C617320  las
416DC383C2A97269636173  Américas
 A m - - - - r i c a s
Note how each accented letter became 4 bytes? And (78-72) = (4-1)*3.
This tells me that each accented character turned into 4 bytes. Not good.

In latin1, each of the accented characters takes 1 byte; in utf8 each takes 2 bytes.

Such is possible with "double encoding". That is, latin1 got converted to utf8. Then the pairs of utf8 were treated as latin1 (wrong!) and converted again to utf8, now yielding 4 bytes.

I'm sorry, but the original loading of the data was done wrongly. Perhaps SET NAME was not used, or other settings were not applied correctly. Let's go back and try to figure that out.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.