Here's the model, as I understand it...
* The bytes in a char or text field are assumed to be encoded according to the character set associated with the column. (This association may be inherited from the system/database/table; use SHOW CREATE TABLE to find out the encoding.)
* You declare the bytes coming from (and going back to) your application by using SET NAMES or setting these three VARIABLES:
| character_set_client | utf8
| character_set_connection | utf8
| character_set_results | utf8
* When data flows from point A to point B (eg, client to server), MySQL looks at the encodings of A and B (see bullet items above) to decide whether a conversion is needed. If it is needed, it applies the appropriate conversion (without you needing to call iconv).
The typical problem is that people fail to do SET NAMES, and the bytes get converted (or fail to get converted) as they are inserted into the table.
Sometimes, you are lucky -- bytes go in incorrectly, then come out incorrectly, but the "two wrongs make a right". This really confuses diagnosing things.
Sometimes, you can get a "double conversion". Example: The column is declared utf8. The client is latin1, but the bytes in the client are really utf8. Inserting a 2-byte e-grave (for example) turns into 4 bytes. As long as the reading client is also latin1 (according to character_set_results), but treats the bytes as utf8, nothing seems broken.
I would like to get my head wrapped around your problem and be able to say "do X and Y", but I am not there. I spend the last day trying to unravel some Hebrew characters; they seem to be taking 5 bytes in the table, when they should take only 2.
Could you give me some actual data to experiment with...
SHOW CREATE TABLE tbl \G
Find some data with accents, and dump it thus:
SELECT x, HEX(x), LENGTH(x), CHAR_LENGTH(x) FROM tbl WHERE ... LIMIT 10;
If possible, tell me what you are expecting 'x' to have in those cases. The declared datatype and the HEX give me something to play with.