Re: load data infile ... with UTF-8 characters
And the problem is back! I really can't understand this.
HISTORY:
Tables are declared UTF-8.
The LOAD DATA INFILE code originally declared CHARACTER SET 'utf8'
UTF8 chars loaded fine ... for a while.
Then it started to fail. Lots of perfectly proper UTF8 chars were loading as junk.
Probably converted (wrongly!) to 2-byte UTF16?
I changed the code to CHARACTER SET 'latin', based on ten-year-old advice on a forum. It worked properly again!
That lasted a week. Failure once again. Changed back to CHARACTER SET 'utf8' as before.
But I prefixed the load with SET NAMES 'utf8';
Like this:
SET NAMES 'utf8';
LOAD DATA INFILE 'C:\\Users\\joe\\Desktop\\test\\suffix.mod.tab'
IGNORE
INTO TABLE catalog.s1_suffix
CHARACTER SET 'utf8'
FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
LINES TERMINATED BY '\n' STARTING BY ''
(sg_ID, id_stamp, @var_descrip, @var_prices, @var_suffix, @var_suffix_ind)
set
description = trim(@var_descrip),
prices = trim(@var_prices),
suffix = trim(@var_suffix),
suffix_ind = trim(@var_suffix_ind);
That worked for about a week. Now it fails.
One specific case (of many) has left- and right-quotes, as in this text that appears in a one-line/one-row load file:
With "Belgica 2001" Interational
The text above shows plain/dumb ASCII quotes because I can't do them right in this forum text-box.
But the actual load file has proper UTF8 left/right quotes. Here's its hex dump:
hex: 57 69 74 68 20 e2 80 9c 42 65 6c 67 69 63 61 20
hex: 32 30 30 31 e2 80 9d 20 49 6e 74 65 72 6e
char: With e2 80 9c Belgica
char: 2001 e2 80 9d Intern...
The e2 80 9c and e2 80 9d codes are the correct left- and right-quote encodings.
Why does this work ... and then not?
Why does it work when the offending quotes are NOT in the first row of the flat-file?
(I added another row into the load file. That row had only pure ASCII. The one with the quotes became row 2. And then it loaded properly!)
And why is it that the load runs to completion when faced with other "offending" UTF8 characters (other than dbl-quotes) ... but nevertheless the load is "false" because those UTF8 chars get converted to wrong (UFT16?) chars.
Subject
Views
Written By
Posted
8579
September 05, 2015 05:42PM
2075
September 13, 2015 08:18PM
2004
September 14, 2015 02:05PM
1668
September 16, 2015 01:16PM
1902
September 16, 2015 04:01PM
1560
September 17, 2015 01:21PM
1514
September 18, 2015 09:24PM
Re: load data infile ... with UTF-8 characters
1812
September 23, 2015 11:49AM
1396
September 29, 2015 01:12AM
1613
October 02, 2015 03:51PM
1503
October 02, 2015 05:28PM
1655
October 02, 2015 09:17PM
1797
October 02, 2015 09:54PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.