This is a character encoding problem. So these things matter:
* The character set of the column. People sometimes assume this
must be the value of one of the 'character_set_...' variables
or the character set of the table. Not necessarily.
To find out the column character set, say:
SELECT character_set_name
FROM information_schema.columns
WHERE column_name = 'name' and table_name = 'locations';
* The values of the character variables. To find out, say:
SHOW VARIABLES LIKE 'char%';
* The hexadecimal values of the bytes in the column. To find out, say:
SELECT HEX(name) FROM locations;
The collation is irrelevant, it has nothing to do with encoding.
Let us take an example. You say that one of the characters in
your original 'lost' database was "small e with acute".
In the 'latin1' character set, the encoding for small e with
acute is: E9. You can look up latin1 encoding in many places
on the Internet, for example Wikipedia:
http://en.wikipedia.org/wiki/Latin1
In the 'utf8' character set, the encoding for small e with
acute is: C3 A9. Again there are many places on the Internet
where you could look this up for confirmation, for example
find "LATIN SMALL LETTER E WITH ACUTE" on this site:
http://www.utf8-chartable.de/
Now let's go the other way. What does C3 A9 mean in latin1?
That's easy: look at Wikipedia again:
C3 is A WITH TILDE, A9 is COPYRIGHT SIGN.
And -- mirabile dictu -- that is exactly what you say you see.
So now we know that your column contains one two-byte UTF8
character, but somewhere along the way an assumption was that it's
two one-byte latin1 characters. Why? Many reasons are possible,
I won't explain them all because (a) I'm not familiar with all
tools and interfaces and clients and connection methods (when
testing, I recommend mysql client with a modern version of Linux
with the client and server on the same computer), (b) I don't
know what your values are for the defined column character set
and character set variables as described above. The manual
is quite thorough so for further questions I recommend it,
including the "East Asian FAQ" which has some hints that are
relevant for all encodings.
Peter Gulutzan
MySQL AB