Re: How to FIX ODD CHARACTERS in a DB table??
Posted by: Peter Gulutzan
Date: November 11, 2007 01:06PM

This is a character encoding problem. So these things matter:
* The character set of the column. People sometimes assume this
must be the value of one of the 'character_set_...' variables
or the character set of the table. Not necessarily.
To find out the column character set, say:
SELECT character_set_name
FROM information_schema.columns
WHERE column_name = 'name' and table_name = 'locations';
* The values of the character variables. To find out, say:
SHOW VARIABLES LIKE 'char%';
* The hexadecimal values of the bytes in the column. To find out, say:
SELECT HEX(name) FROM locations;
The collation is irrelevant, it has nothing to do with encoding.

Let us take an example. You say that one of the characters in
your original 'lost' database was "small e with acute".
In the 'latin1' character set, the encoding for small e with
acute is: E9. You can look up latin1 encoding in many places
on the Internet, for example Wikipedia:
http://en.wikipedia.org/wiki/Latin1
In the 'utf8' character set, the encoding for small e with
acute is: C3 A9. Again there are many places on the Internet
where you could look this up for confirmation, for example
find "LATIN SMALL LETTER E WITH ACUTE" on this site:
http://www.utf8-chartable.de/
Now let's go the other way. What does C3 A9 mean in latin1?
That's easy: look at Wikipedia again:
C3 is A WITH TILDE, A9 is COPYRIGHT SIGN.
And -- mirabile dictu -- that is exactly what you say you see.

So now we know that your column contains one two-byte UTF8
character, but somewhere along the way an assumption was that it's
two one-byte latin1 characters. Why? Many reasons are possible,
I won't explain them all because (a) I'm not familiar with all
tools and interfaces and clients and connection methods (when
testing, I recommend mysql client with a modern version of Linux
with the client and server on the same computer), (b) I don't
know what your values are for the defined column character set
and character set variables as described above. The manual
is quite thorough so for further questions I recommend it,
including the "East Asian FAQ" which has some hints that are
relevant for all encodings.

Peter Gulutzan
MySQL AB

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: How to FIX ODD CHARACTERS in a DB table??
10559
November 11, 2007 01:06PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.