MySQL Forums :: Character Sets, Collation, Unicode :: Character encodings issues with php, mysql, apache
Re: Character encodings issues with php, mysql, apache
Posted by: Rick James ()
Date: March 31, 2009 07:46PM
"El 31 de enero revelarán ganadores de 49º Premio Casa de las Américas" 000000000111111111122222222223333333333444444444455555555556666666666 123456789-123456789-123456789-123456789-123456789-123456789-123456789 title : El 31 de enero revelarÃ¡n ganadores de 49Âº Premio Casa de las AmÃ©ricas LENGTH(title) : 78 CHAR_LENGTH(title) : 72 HEX(title) : 456C20 El 333120 31 646520 de 656E65726F20 enero 726576656C6172C383C2A16E20 reve... r e v e l a r - - - - n 67616E61646F72657320 ganadores 646520 de 3439C382C2BA20 49... 4 9 - - - - 5072656D696F20 Premio 4361736120 Casa 646520 de 6C617320 las 416DC383C2A97269636173 Américas A m - - - - r i c a sNote how each accented letter became 4 bytes? And (78-72) = (4-1)*3.
This tells me that each accented character turned into 4 bytes. Not good.
In latin1, each of the accented characters takes 1 byte; in utf8 each takes 2 bytes.
Such is possible with "double encoding". That is, latin1 got converted to utf8. Then the pairs of utf8 were treated as latin1 (wrong!) and converted again to utf8, now yielding 4 bytes.
I'm sorry, but the original loading of the data was done wrongly. Perhaps SET NAME was not used, or other settings were not applied correctly. Let's go back and try to figure that out.
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.