"El 31 de enero revelarán ganadores de 49º Premio Casa de las Américas"
000000000111111111122222222223333333333444444444455555555556666666666
123456789-123456789-123456789-123456789-123456789-123456789-123456789
title : El 31 de enero revelarán ganadores de 49º Premio Casa de las Américas
LENGTH(title) : 78
CHAR_LENGTH(title) : 72
HEX(title) : 456C20 El
333120 31
646520 de
656E65726F20 enero
726576656C6172C383C2A16E20 reve...
r e v e l a r - - - - n
67616E61646F72657320 ganadores
646520 de
3439C382C2BA20 49...
4 9 - - - -
5072656D696F20 Premio
4361736120 Casa
646520 de
6C617320 las
416DC383C2A97269636173 Américas
A m - - - - r i c a s
Note how each accented letter became 4 bytes? And (78-72) = (4-1)*3.
This tells me that each accented character turned into 4 bytes. Not good.
In latin1, each of the accented characters takes 1 byte; in utf8 each takes 2 bytes.
Such is possible with "double encoding". That is, latin1 got converted to utf8. Then the pairs of utf8 were treated as latin1 (wrong!) and converted again to utf8, now yielding 4 bytes.
I'm sorry, but the original loading of the data was done wrongly. Perhaps SET NAME was not used, or other settings were not applied correctly. Let's go back and try to figure that out.