Migration encoding issues: same bytes, different text
Posted by: Jason Heeris
Date: July 22, 2009 10:52PM

Hi,

I apologise if this question is all over the place, because I cannot make sense of this myself.

I was running a 4.1 server with tables in "latin1" encoding. I used the admin GUI (5.0) to pull a backup and transfer it to a 5.1 server, using the default settings for the transfer. It seemed to work. The 5.1 tables are in the Windows 1252 encoding.

After using the new database for a while, we noticed a character was not displaying correctly, wherever it occurred (double backquote). I could do a search-and-replace thing, but I worry that there are other problems in the other databases that I don't yet know about.

What I would like to do is somehow repair all of these encoding problems, in case there are other, rarer characters we haven't seen yet. Here's the problem:

If I use mysqldump --default-character-set="latin1" from the 4.1 server, the data comes out fine. I wish I'd done this in the first place, but it's too late now.

I can't just dump the data from the 5.1 server, load it in to the 4.1 server and dump it again as above - it doesn't work. Here's the thing: I have two dump files, one is from the 4.1 "mysqldump --default-character-set=latin1 ..." (happy.sql) and one is from the 5.1 "mysqldump ..." (sad.sql)

I can "mysql <..." them into the 4.1 server, and look at a certain string containing this character. At the byte level, they're exactly the same no matter what file they're from. The table and column encoding are the same. When I re-dump the files using "--default-character-set=latin1", the one that came from "happy.sql" has the character in there just as it should be, I can open it in a UTF-8 capable editor and it looks fine. But if I load-and-dump "sad.sql", the character is replaced by a jumble of other characters.

So, same bytes, same encoding, different text when dumped.

Is there an extra layer of en/de-coding somewhere that I can control to re-encode the data sitting in the 5.1 database? Better yet, is there a way to repair this somehow in place?

Any advice whatsoever would be appreciated.

Thanks,
Jason

Options: ReplyQuote


Subject
Written By
Posted
Migration encoding issues: same bytes, different text
July 22, 2009 10:52PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.