MySQL Forums
Forum List  »  General

Re: Northwind DB and character encoding
Posted by: Rick James
Date: May 09, 2014 12:34PM

> Company Name: Berglunds snabbköp 4265726C756E6473206E6162626BF6
> Address: Berguvsvägen 4265726775737676E467656E
> City: Luleå 4C756C65E5
http://mysql.rjweb.org/doc.php/charcoll#8_bit_encodings
The F6 and E4 and E5 match the u-umlaut and a-umlaut and a-ring -- but for "latin1", not utf8.

This implies (I think) that the bytes being used for INSERTing were latin1, but the SET NAMES was utf8. (Or maybe some other combination.)

In any case, F6 is not a good byte for a utf8 or utf8mb4 field. That is, the table is garbled. o-umlaut is two bytes: C3B6 in utf8 (or utf8mb4).

The first and last examples -- is the F6 and E5 the end of the value in the column? When inserting latin1 into utf8, the string is truncated.
(I can't explain the second example, since it continues after the E4.)

Bottom line... You may need to reload the data with the appropriate SET NAMES (or equivalent). Be sure to do the SELECT HEX() to verify that the data is stored correctly.

(For Western European characters, as you seem to have, utf8 and utf8mb4 are the same.)

> Is the character encoding is used in the text file a factor, and, if so, how can I determine what it is?

If you are feeding that directly into MySQL, the encoding is critical. You need to tell MySQL what the encoding is (SET NAMES) so that it can convert (if necessary) to the encoding of the column in the table (utf8mb4).
How to determine? A raw hex dump of the text file is perhaps the best way.
o-umlaut = F6 --> latin1
o-umlaut = C3B6 --> utf8 or utf8mb4
Spotting a single Western European character is sufficient to make this distinction. For Eastern Europe, Asia, MicroSoft deviants, etc, this has some clues:
http://mysql.rjweb.org/doc.php/charcoll#diagnosing_charset_issues

Options: ReplyQuote


Subject
Written By
Posted
Re: Northwind DB and character encoding
May 09, 2014 12:34PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.