MySQL Forums
Forum List  »  Newbie

Re: Can not get proper utf8 text
Posted by: Rick James
Date: June 08, 2011 09:22PM

2 _characters_ of Kanji, in utf8, requires 6 _bytes_ in utf8. If each of those 6 bytes were mis-interpreted as latin1 bytes, and transcoded from latin1 into utf8, each would turn into 2 or 3 bytes.

Sounds like "double encoding". The settings were incorrect when the data was stored into the table. It came from the user as utf8, but the input program had SET NAMES latin1 (or something else incorrect), so the 6 bytes were converted as they were sent to the server for insertion.

You have a messy situation. Alas, I do not have a "do this" section of that document that clearly states what to do. It does say several things that might work. Please try them and report back; I'll update the document with your findings. Others will love to know what worked for you.

This is probably the breakdown of that string:
C3A5 E280A0 E280A6 C3A9 E28093 E2809C

+-------------------------------------------------------------------------------
-----------+
| HEX(CONVERT(CONVERT(UNHEX('C3A5E280A0E280A6C3A9E28093E2809C') USING utf8) USING latin1)) |
+-------------------------------------------------------------------------------
-----------+
| E58685E99693

That looks more like two CJK characters;

Options: ReplyQuote


Subject
Written By
Posted
Re: Can not get proper utf8 text
June 08, 2011 09:22PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.