Characters not getting stored
Posted by: Andrea Croci
Date: April 02, 2013 01:19PM

Hello, I have yet another problem with character sets. I have read lots of threads here and the very informative page at http://mysql.rjweb.org/doc.php/charcoll, but I found no answer.

Here is the issue: I'm designing a new database from scratch on MySQL version 5.5.27-log and I would like to use utf8 with collation utf8_unicode_ci throughout, because it will have to store names in German, Italian and English.

I defined the database to use DEFAULT CHARACTER SET = utf8 COLLATE utf8_unicode_ci, I declared each table with DEFAULT CHARACTER SET = utf8 and I even repeated the same thing in most columns of the tables.

The only two tables I double checked with SHOW CREATE TABLE tbl show everything as utf8 with collate utf8_unicode_ci. The others don't matter because I have the problem already with one of these.

I tried different things and basically tried to store 'Forlì' and (just to mess with it) 'Förlìßen'.

With the first one I get a warning "Incorrect string value '\x8D' for column 'birthplace' at row 1" and when I select it, it shows just 'Forl' without the final 'ì'. The second one too gets truncated at the first "invalid" character that it encounters: ö, with an "Incorrect string value" error message. That means it stores only 'F'.

The test with 'birthplace', hex(brithplace), length(birthplace) and char_length(birthplace) shows the following:

for 'Forlì': 'Forl', 466F726C, 4, 4.
for Förlìßen: 'F', 46, 1, 1.

All of this happens when I set the "character_set_*" variables to utf8 (other than the filesystem, which I didn't touch).

With other combinations of "character_set_*" I do get different symbols on the screen and different hex numbers in the table, but I have not got the right one yet: if I set names to latin1 or cp850 (which appears to be the default in my installation, the initial set-up I get when starting the console is that) and leave character_set_database at utf8, then it does not truncate the strings at the first non-english character, but the results are only shown correct if I enter the data from console.

I tried importing the data from a file, saved both as latin1 and as utf8, and it does store something, but not the right thing, regardles of the variables setting.

I dropped the database and recreated it several times with different settings of the variables at the beginning, but I don't get it right.

In all these situations I'm entering the letters from a german keyboard under Windows 7, if that can make any difference.

I'm completely confused: it seems it only works if I enter the data from the keyboard, and this again if I don't set names to utf8 (which I thought would actually be the right thing to do). Also I would like to import them from a file because sometimes I don't have access to the server and it would be more convenient for me to prepare a file and then import it all at once.

Any help would be appreciated.

Thank you.

Options: ReplyQuote


Subject
Views
Written By
Posted
Characters not getting stored
4987
April 02, 2013 01:19PM
1840
April 04, 2013 07:48PM
2081
April 05, 2013 01:00AM
1942
April 05, 2013 11:07PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.