Re: replication from 4.1 to 5.5 and character set change
Posted by: Rick James
Date: February 11, 2012 01:40PM

BINARY and BLOBs contain _bytes_ -- no encoding.
CHAR and TEXT contain _characters_ -- there is an encoding, even in 4.1.

If you have non-text stuff (eg images) in TEXT encoded latin1, it will work, but it is asking for trouble.

If you have text in a BLOB field, and you continue to declare it BLOB, it will work, but you are not taking advantage of character set checking.

I would take a mysqldump of 4.1 (with utf8 specified on the command line), then load it into 5.5. No steps in between.

MySQL (including mysqldump) looks at the declared encoding source and destination. If they are not the same (and neither is BINARY/BLOB), it will convert as the transfer is being done.

Text columns (since 4.1) have a CHARACTER SET. This establishes the source encoding when doing mysqldump and the target encoding when reloading. Mysqldump can specify the encoding to use for the dump (target of the 4.1->dump). When reloading (dump->5.5), the dump will self-declare; the targets are declared in the CREATE TABLE.

An issue... You probably need to do the dump+reload in two steps.
1. Dump the schema (--no-data). Then manually change the dump file to utf8. Then load onto 5.5
2. Dump just the data (--no-create? or something). This needs the --encode=utf8(?). That can go directly into 5.5.

Check the results!! I do not have 100% confidence in this process. This document can give you hints of how to use HEX() to check the encoding:
http://mysql.rjweb.org/doc.php/charcoll

> from an obsolete version 4.1 to a slightly newer version (5.5)
"slightly newer" Haha -- 2 major releases over ~7 years.

There are many other issues you could encounter in the big jump. Some you won't know until you start running code. One is that there are lots of new keywords that could conflict with your column names.

I have seen a worse upgrade: 4.0->5.1. (Worse becaues 4.0 had no concept of CHARACTER SET, and the schemas were a mess.) Three thing were known to cause trouble: keywords, change in precedence of comma-join an JOIN, handling of INSERTing into NULL fields. These were easily fixed in a compatible way in the application code. In this migration, we did the following, which I further suggest:

After the dump and reload, make the new 5.5 server a Slave to your 4.1 Master. Let that run for several days. This should discover bad keywords, and any incompatibilities relating to _writes_. Then test your reads against this Slave. Eventually switch writes from the 4.1 machine to the 5.5 machine and abandon the old box.

> Obviously replication between these two versions is not feasible.
As far as I know, replicating from an old version to a newer one is always possible.

> How can i replicate the data having the blob content converted into utf8?
I don't understand. If it was happy as a BLOB in 4.1, it should be happy as a BLOB in 5.5.

> 4.1 (iso)
Which iso? SHOW CREATE TABLE.

Do you have any German "sharp-s" characters? There was a change in some collation relating to that. It has burned a few people by creating "duplicate key" where there had not be before.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: replication from 4.1 to 5.5 and character set change
2255
February 11, 2012 01:40PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.