Re: Clean up Character Set mess in the same field of one table
Posted by: Rick James
Date: April 08, 2014 11:48AM

Smells like "double encoding". Here is a long discussion of how that happens, and some tips on how to deal with it.
http://mysql.rjweb.org/doc.php/charcoll

> I have a problem with a table that contains using different character sets (latin1 and utf8) in the same field.

Disaster. Do you have any way to know which rows are which? If so, a suitable WHERE clause on an UPDATE may help you fix it.

Caution: Don't do UPDATE; you could make things worse. Instead, plan on creating another table and copying rows into it, fixing them as you do so. This may take many attempts, since character set problems are both hard to understand and hard to know how to fix.

OK, to dig further, and after you have read that link I gave you, do this using the mysql commandline interface:
SELECT id, groupname, stringfield,
LENGTH(stringfield), CHAR_LENGTH(stringfield),
HEX(stringfield)
FROM tablename
ORDER BY LENGTH(stringfield) LIMIT 10;
(The ORDER BY and LIMIT are an attempt to find just some of the short ones; change if you have a better way.)
Please annotate with what the strings should be (if it is not obvious).

With that output, it should be easy to see what happened for each row, and develop a plan of action.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Clean up Character Set mess in the same field of one table
1464
April 08, 2014 11:48AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.