Re: utf8 and utf8mb4 performance difference
Posted by:
Xing Zhang
Date: September 20, 2016 08:38PM
If you only want to store characters in table, there is no performance difference between utf8 and utf8mb4 if the characters are in BMP. But if you also want to store SMP characters, you have no choice, only utf8mb4 can do that. And I also believe the performance difference is little.
But I'm sure you might want to use collations to do sorting. We need to figure out first what kind of collation you are using. For both utf8 and utf8mb4, there are 2 kinds of collations. One is the "general way", the sort order of characters are pre-defined in numbers of character tables. Another is the "UCA way", the sort order is defined by DUCET (default unicode collation element table) which is provided by Unicode committee.
By MySQL 5.7, the default collation of utf8 is utf8_general_ci and the default collation of utf8mb4 is utf8mb4_general_ci. Both of these 2 collations can ONLY sort characters in [U+0000, U+FFFF]. (Yes, utf8mb4's collation can only sort characters in BMP)
Both these 2 collations use the "general way", so there is no performance difference between them. But if you compare utf8mb4_unicode_520_ci and utf8mb4_general_ci (or utf8_unicode_520_ci vs utf8_general_ci), I think there should have some performance difference. I don't have the data, but it might be 10-15% slower doing sorting. The difference is because the "general way" maps character's sorting weight to character 1:1, but the UCA's weight to character is N:1 to get more accurate result.
Subject
Views
Written By
Posted
4003
September 09, 2016 07:25AM
2148
September 10, 2016 10:48PM
1720
September 12, 2016 08:55PM
1531
September 13, 2016 04:07PM
1560
September 18, 2016 12:20AM
1298
September 18, 2016 01:40PM
1202
September 19, 2016 02:14AM
Re: utf8 and utf8mb4 performance difference
8150
September 20, 2016 08:38PM
1878
September 25, 2016 01:20AM
2085
September 30, 2016 08:04PM