UTF8, latin1 and performance impact
Posted by: Christophe Schermesser
Date: March 10, 2014 04:55AM

Hi MySQL people,

Here's the story. I have 2 tables, in latin1, and a copy of these in UTF8 (since data itself is 100% English language strings for now).
Let's name the tables table_a_latin1, table_b_latin1, table_a_utf8, table_b_utf8 (utf8_unicode_ci).
These tables use MyISAM, they have about 6M row each, rows are rather long (as in 30 attributes, all strings for table_a, and 20 for table_b). Both have their PRIMARY INDEX on a int 'id', and a UNIQUE index on the first column after that: 'name' VARCHAR(10).
I have also checked table sizes:
table_a: 857M
table_a indexes: 148M for latin1 and 115M for UTF8 (how is this possible?)
table_b: 1.4G
table_b indexes: 356M for latin1 and 304M for UTF8 (...?)

I am comparing time spent on INNER JOIN.
Everything looking equal to me, I want to compare time spent on a query like:
SELECT * FROM 'table_a' tablea INNER JOIN 'table_b' tableb ON tablea.'name' = tableb.'name';

When joining the 2 latin1 tables, query is like 3 times faster than the join on their utf8 equivalent.

For instance, a SELECT COUNT(*) on the latin1 tables takes 35s, and 1min50s on the UTF8 tables.

When using EXPLAIN, the index is correctly identified, difference being that key_len is 12 for latin1 and 32 for UTF8.

I'd like to stress that the data is from English language only (at least for now). If I got it right, UTF8 doesn't use more than the one byte to encode a latin1 character. So a comparison between 2 utf8 strings should not take much longer than the one on latin1, or should it?

Considering the impact on performance (it's a somewhat frequent join), it's really important to me to understand what is happening, and if I'm doing something wrong.
Let me know if some info is missing.

Options: ReplyQuote


Subject
Views
Written By
Posted
UTF8, latin1 and performance impact
8214
March 10, 2014 04:55AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.