Re: How to correct corrupted UTF-8 characters
Posted by: Rick James
Date: October 26, 2010 11:20PM

"Despite the references to "latin1" the varchar and text fields all contain valid UTF-8" -- that, in itself, will cause lots of trouble.

Not all of them follow this pattern: "0xhh to 0xC2hh".

20 C2BD 20
is the correct utf8 encoding for 1/2 (surrounded by spaces). But, since length=char_length (both 51), it is not stored as utf8. In fact:
`Location` varchar(255) CHARACTER SET latin1
The fact that the output says 1/2 means that the two latin1 bytes are being treated by the browser as utf8. This is a case of "two wrongs make a right".

I go into more detail here: http://mysql.rjweb.org/doc.php/charcoll
Alas, I do not have enough discussion of ways to fix data.

20 BD 20
Looks like the correct latin1 encoding. This row is 'correct'.

20 C382C2BD 20
Yikes -- the dreaded "double encoding". (See my web page.) The browser was being extra smart in double decoding it.

The 4th hex listing has the right stuff in the table, but still confused the browser.

See also ALTER TABLE ...CONVERT...

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: How to correct corrupted UTF-8 characters
3634
October 26, 2010 11:20PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.