MySQL :: Re: what is the step to convert a db from latin1 to UTF8

New Topic

Re: what is the step to convert a db from latin1 to UTF8

Posted by: Rick James
Date: October 17, 2009 07:01PM

Here's the model, as I understand it...

* The bytes in a char or text field are assumed to be encoded according to the character set associated with the column. (This association may be inherited from the system/database/table; use SHOW CREATE TABLE to find out the encoding.)

* You declare the bytes coming from (and going back to) your application by using SET NAMES or setting these three VARIABLES:

| character_set_client     | utf8
| character_set_connection | utf8
| character_set_results    | utf8

* When data flows from point A to point B (eg, client to server), MySQL looks at the encodings of A and B (see bullet items above) to decide whether a conversion is needed. If it is needed, it applies the appropriate conversion (without you needing to call iconv).

The typical problem is that people fail to do SET NAMES, and the bytes get converted (or fail to get converted) as they are inserted into the table.

Sometimes, you are lucky -- bytes go in incorrectly, then come out incorrectly, but the "two wrongs make a right". This really confuses diagnosing things.

Sometimes, you can get a "double conversion". Example: The column is declared utf8. The client is latin1, but the bytes in the client are really utf8. Inserting a 2-byte e-grave (for example) turns into 4 bytes. As long as the reading client is also latin1 (according to character_set_results), but treats the bytes as utf8, nothing seems broken.

I would like to get my head wrapped around your problem and be able to say "do X and Y", but I am not there. I spend the last day trying to unravel some Hebrew characters; they seem to be taking 5 bytes in the table, when they should take only 2.

Could you give me some actual data to experiment with...
SHOW CREATE TABLE tbl \G
Find some data with accents, and dump it thus:
SELECT x, HEX(x), LENGTH(x), CHAR_LENGTH(x) FROM tbl WHERE ... LIMIT 10;
If possible, tell me what you are expecting 'x' to have in those cases. The declared datatype and the HEX give me something to play with.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

what is the step to convert a db from latin1 to UTF8

7815

housy hou

October 11, 2009 10:04PM

Re: what is the step to convert a db from latin1 to UTF8

3406

housy hou

October 12, 2009 09:44PM

Re: what is the step to convert a db from latin1 to UTF8

5144

Rick James

October 14, 2009 10:34PM

Re: what is the step to convert a db from latin1 to UTF8

3170

housy hou

October 15, 2009 02:12PM

Re: what is the step to convert a db from latin1 to UTF8

3005

Rick James

October 15, 2009 07:53PM

Re: what is the step to convert a db from latin1 to UTF8

4006

housy hou

October 16, 2009 10:39PM

Re: what is the step to convert a db from latin1 to UTF8

2857

housy hou

October 16, 2009 10:40PM

Re: what is the step to convert a db from latin1 to UTF8

3508

housy hou

October 16, 2009 10:38PM

Re: what is the step to convert a db from latin1 to UTF8

3224

Rick James

October 17, 2009 07:01PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.