MySQL Forums
Forum List  »  InnoDB

Re: utf8 and utf8mb4 performance difference
Posted by: Xing Zhang
Date: September 20, 2016 08:38PM

If you only want to store characters in table, there is no performance difference between utf8 and utf8mb4 if the characters are in BMP. But if you also want to store SMP characters, you have no choice, only utf8mb4 can do that. And I also believe the performance difference is little.

But I'm sure you might want to use collations to do sorting. We need to figure out first what kind of collation you are using. For both utf8 and utf8mb4, there are 2 kinds of collations. One is the "general way", the sort order of characters are pre-defined in numbers of character tables. Another is the "UCA way", the sort order is defined by DUCET (default unicode collation element table) which is provided by Unicode committee.

By MySQL 5.7, the default collation of utf8 is utf8_general_ci and the default collation of utf8mb4 is utf8mb4_general_ci. Both of these 2 collations can ONLY sort characters in [U+0000, U+FFFF]. (Yes, utf8mb4's collation can only sort characters in BMP)

Both these 2 collations use the "general way", so there is no performance difference between them. But if you compare utf8mb4_unicode_520_ci and utf8mb4_general_ci (or utf8_unicode_520_ci vs utf8_general_ci), I think there should have some performance difference. I don't have the data, but it might be 10-15% slower doing sorting. The difference is because the "general way" maps character's sorting weight to character 1:1, but the UCA's weight to character is N:1 to get more accurate result.

Options: ReplyQuote

Written By
September 09, 2016 07:25AM
September 12, 2016 08:55PM
September 18, 2016 12:20AM
September 19, 2016 02:14AM
Re: utf8 and utf8mb4 performance difference
September 20, 2016 08:38PM
September 25, 2016 01:20AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.