Re: Croatian characters not surviving LOAD DATA INFILE from UTF-8 .txt file
Posted by: Rick James
Date: August 23, 2011 09:04AM

GIGO

Kupreškić <-- This has the letters encoded according to some character set (probably utf8 on this forum)
Kupre%C5%A1ki%C4%87 <-- This is "html encoding". It is understood only by browsers.

You have CHARACTER SET latin2 on the table.
What arguments did you use when doing the LOAD DATA? That makes a big difference.
Are the values in the table correct? That is hard to test, SELECT HEX may help. See this for further discussion:
http://mysql.rjweb.org/doc.php/charcoll

Kupre??kiÄ? <-- This is very likely due to incorrect settings either during loading of the data, or during fetching the data. Again see my web page.

I think 5.5.12 has a utf8_croatian_ci; it might be better in the long run to use utf8 instead of latin2 -- this would let you use character sets from all languages, and might avoid the need for different columns having different charsets.
SHOW COLLATION;

"%C5%A1" -- This smells a lot like it came from a utf8 encoding (hex C5A1), not latin2. Perhaps your data is really utf8? Another argument for switching from latin1/2 to utf8.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Croatian characters not surviving LOAD DATA INFILE from UTF-8 .txt file
3010
August 23, 2011 09:04AM
2635
August 23, 2011 10:58AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.