Croatian characters not surviving LOAD DATA INFILE from UTF-8 .txt file
Posted by: Paul Pikowsky
Date: August 21, 2011 11:51AM

I am trying to import some data from a .txt file set for UTF-8 into a new table that supports Croatian text. But some characters are not surviving the import. They show up as unknown characters.

Let me also add that this is a 5.5.12 MySQL Community Server (GPL) by Remi running on the latest CentOS distribution.

Here is the table,

delimiter $$

CREATE TABLE `abcd` (
  `xxxx4` varchar(100) CHARACTER SET latin1 NOT NULL DEFAULT 'xxxx4 Here',
  `xxxx5` varchar(100) CHARACTER SET latin2 COLLATE latin2_croatian_ci NOT NULL DEFAULT 'xxxx5 Here',
  `xxxx3` varchar(100) CHARACTER SET latin1 DEFAULT NULL,
  `xxxx2` varchar(200) CHARACTER SET latin2 COLLATE latin2_croatian_ci DEFAULT 'xxx2 Here',
  `xxxx1` varchar(100) CHARACTER SET latin1 NOT NULL DEFAULT 'xxxx1 Here',
  PRIMARY KEY (`xxxx5`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci$$

And here is a data sample

Acquitted**Mirjan Kupreškić++/w/index.php?title=Mirjan_Kupre%C5%A1ki%C4%87&action=edit&redlink=1**Bosnian Croat, HVO member**Lašva Valley massacres against Bosniak civilians**Acquitted on 23 October 2001.7++#cite_note-kupreskic-6
Acquitted**Vlatko Kupreškić++/w/index.php?title=Vlatko_Kupre%C5%A1ki%C4%87&action=edit&redlink=1**Bosnian Croat, HVO member**Lašva Valley massacres against Bosniak civilians**Acquitted on 23 October 2001.7++#cite_note-kupreskic-6

Here is how they show up:

'Acquitted', 'Mirjan Kupre??kiÄ?++/w/index.php?title=Mirjan_Kupre%C5%A1ki%C4%87&action=edit&redlink=1', 'Bosnian Croat, HVO member', 'La??va Valley massacres against Bosniak civilians', 'Acquitted on 23 October 2001.7++#cite_note-kupreskic-6'
'Acquitted', 'Vlatko Kupre??kiÄ?++/w/index.php?title=Vlatko_Kupre%C5%A1ki%C4%87&action=edit&redlink=1', 'Bosnian Croat, HVO member', 'La??va Valley massacres against Bosniak civilians', 'Acquitted on 23 October 2001.7++#cite_note-kupreskic-6'


The LOAD DATA INFILE command does not involve itself with character sets or collations as far as I know.

I already have a table in the same database that is showing the characters correctly and this table copies the attributes of that other table, such as column collation and table type and default collation for the table.

Notice the Croatian characters show up fine in this post.

What is going on? What do I change to make sure these characters survive the import?



Edited 1 time(s). Last edit at 08/21/2011 02:18PM by Paul Pikowsky.

Options: ReplyQuote


Subject
Views
Written By
Posted
Croatian characters not surviving LOAD DATA INFILE from UTF-8 .txt file
4956
August 21, 2011 11:51AM
2637
August 23, 2011 10:58AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.