Fixing Double Enocoding With mysqldump
Posted by: Peter Berry
Date: October 20, 2011 04:59PM

Hi, I've just diagnosed my problem as "Double Encoding" as described by Rick James' incredibly helpful page at http://mysql.rjweb.org/doc.php/charcoll.

Indeed I have UTF8 data going in, a connection (erroneously) set to Latin1, and a database and tables set to UTF8. One 2 byte UTF8 character in ends up encoded as 4 bytes in my tables.

My question is about fixing it. It seems like by far the easiest thing to do would be:

1. mysqldump --default-character-set=latin1 .... my_database > my_database_latin1.sql
2. Edit my_database_latin1.sql to set NAMES=utf8 at the top.
3. mysql ... < mydatabase.sql

The dump should convert the 4 bytes which it thinks are 2 utf8 characters back down to 2 bytes (2 latin1 characters). The file should actually contain UTF8, though it thinks it's latin1. So change the encoding of the connection set at the top of the file to UTF8, and import this data back into the same database.

Since Rick doesn't quite mention this fix, I'm assuming I'm missing something. A quick test does look ok so far.

Am I missing something?

Thanks for the help - Peter

Options: ReplyQuote


Subject
Views
Written By
Posted
Fixing Double Enocoding With mysqldump
5416
October 20, 2011 04:59PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.