We have a csv file that contains an accented character - É in the first_name column:
user_name password sis_id first_name middle_name last_name grade
902117 902117 902117 LÉA E GILFRICHE K
Our Ant script creates our database in CHARSET utf8. When we run the query below:
LOAD DATA INFILE 'StudentImport_StudentNameSpecChar.csv'
INTO TABLE slms_temp FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '' IGNORE 1 LINES (COL1,COL2,COL3,COL4,COL5,COL6,COL7);
It fails with an error:
ERROR 1366 (HY000): Incorrect string value: '\xC9A' for column 'COL4' at row 1
We have come up with two 'solutions':
1. When we alter the database:
ALTER DATABASE lycea DEFAULT CHARACTER SET latin1 DEFAULT COLLATE latin1_swedish_ci;
And run the query again IT WORKS.
2. When we augment the LOAD DATA INFILE query to include the CHARACTER SET latin1 attribute:
LOAD DATA INFILE 'StudentImport_StudentNameSpecChar.csv'
INTO TABLE slms_temp CHARACTER SET latin1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '
' IGNORE 1 LINES (COL1,COL2,COL3,COL4,COL5,COL6,COL7);
And run the query again IT WORKS.
The slms_temp is a storage table made of COL1 through COL50:
CREATE TABLE `slms_temp` (
`COL1` varchar(100) default NULL,
`...
`COL50` varchar(100) default NULL
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
Also, We run our applications on Windows XP and MAC OS X.
My question is: is the solution to specify latin1 correct and if not what am I missing?
Thank you.
Noga
MySQL DBA
Follett Digital Resource - Medford, MA
nworonoff@fdr.follett.com