Hi, I've just diagnosed my problem as "Double Encoding" as described by Rick James' incredibly helpful page at
http://mysql.rjweb.org/doc.php/charcoll.
Indeed I have UTF8 data going in, a connection (erroneously) set to Latin1, and a database and tables set to UTF8. One 2 byte UTF8 character in ends up encoded as 4 bytes in my tables.
My question is about fixing it. It seems like by far the easiest thing to do would be:
1. mysqldump --default-character-set=latin1 .... my_database > my_database_latin1.sql
2. Edit my_database_latin1.sql to set NAMES=utf8 at the top.
3. mysql ... < mydatabase.sql
The dump should convert the 4 bytes which it thinks are 2 utf8 characters back down to 2 bytes (2 latin1 characters). The file should actually contain UTF8, though it thinks it's latin1. So change the encoding of the connection set at the top of the file to UTF8, and import this data back into the same database.
Since Rick doesn't quite mention this fix, I'm assuming I'm missing something. A quick test does look ok so far.
Am I missing something?
Thanks for the help - Peter