Re: utf8 and utf8mb4 performance difference
Posted by:
Xing Zhang
Date: September 20, 2016 08:38PM
If you only want to store characters in table, there is no performance difference between utf8 and utf8mb4 if the characters are in BMP. But if you also want to store SMP characters, you have no choice, only utf8mb4 can do that. And I also believe the performance difference is little.
But I'm sure you might want to use collations to do sorting. We need to figure out first what kind of collation you are using. For both utf8 and utf8mb4, there are 2 kinds of collations. One is the "general way", the sort order of characters are pre-defined in numbers of character tables. Another is the "UCA way", the sort order is defined by DUCET (default unicode collation element table) which is provided by Unicode committee.
By MySQL 5.7, the default collation of utf8 is utf8_general_ci and the default collation of utf8mb4 is utf8mb4_general_ci. Both of these 2 collations can ONLY sort characters in [U+0000, U+FFFF]. (Yes, utf8mb4's collation can only sort characters in BMP)
Both these 2 collations use the "general way", so there is no performance difference between them. But if you compare utf8mb4_unicode_520_ci and utf8mb4_general_ci (or utf8_unicode_520_ci vs utf8_general_ci), I think there should have some performance difference. I don't have the data, but it might be 10-15% slower doing sorting. The difference is because the "general way" maps character's sorting weight to character 1:1, but the UCA's weight to character is N:1 to get more accurate result.
Subject
Views
Written By
Posted
3325
September 09, 2016 07:25AM
1933
September 10, 2016 10:48PM
1548
September 12, 2016 08:55PM
1369
September 13, 2016 04:07PM
1384
September 18, 2016 12:20AM
1152
September 18, 2016 01:40PM
1066
September 19, 2016 02:14AM
Re: utf8 and utf8mb4 performance difference
7354
September 20, 2016 08:38PM
1677
September 25, 2016 01:20AM
1847
September 30, 2016 08:04PM