UTF8 encoding prodblem for U+00E9
Posted by: Bill Bolek
Date: November 10, 2009 02:50PM

I have a MySQL database that is set up at a database level for collation = utf8_unicode_ci. Every table and every character based column is also set up with collation = utf8_unicode_ci.

I am loading a table with a tab delimited text file that contains in one string an accented lower case "e"

Unicode code point = U+00E9 (LATIN SMALL LETTER E WITH ACUTE)
Extended Ascii character = 130

When I do a HEX DUMP (hexdump on Linux) I see the character in question in the raw text file has a hex value = E9. (That is what I expect)

When I load the file into the MySQL table it is getting translated to a UTF-8 (hex) value of c2 9D. It should be c3 a9.

I am using an ETL tool called Kettle form Open Source Vendor Pentaho to load the raw tab delimited text file into


I have triple checked all the collation settings and they are all utf8-unicode-ci. I have started and stopped mysql to try to reset things.

The strange thing is, the server that is having this problem was cloned 1 month ago from another server, so same MySQL version, etc. That server used as the source of the clone can accept the exact same tab delimited text file as store it properly as c3 a9.

I can't figure out what else could be wrong.

Any ideas ?

Thank You.

Options: ReplyQuote


Subject
Views
Written By
Posted
UTF8 encoding prodblem for U+00E9
5868
November 10, 2009 02:50PM
2861
November 10, 2009 05:22PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.