MySQL Forums
Forum List  »  MyISAM

Re: why bigger index of ascii than utf8?
Posted by: Rick James
Date: March 23, 2013 06:33PM

Interesting!

Here's a guess...

The Ascii test case was done in the straightforward way...
Each 'row' in the index contained a 4-byte INT + 11-byte CHAR(11) + 6(?)-byte row number pointing into the .MYD file. Then there is overhead for the BTree. The math works out to be about 2350080.

The utf8 case worked differently (more efficiently) (No, I don't know why)...
Each 'row' in the index contained INT + CHAR + Offset + BTree overhead.
The char part was either a 33-byte fixed length CHAR(11) utf8 or a variable length string which (for your examples) would be 1 byte length + 11 bytes of characters.
The offset would still be a row number, not a byte offset. (I deduce this because the .MYD is exactly right for that.)
However... If the index is 'compressed' by the fairly trivial trick of getting rid of common prefixes, then the small size would be explained. That is, since the first 9 rows start with 1+A000000000, then that part can be saved only once; the other rows would be annotated with the fact that the first several bytes are the same. (There are other flavors of how to do that, but maybe you get the gist.)

MyISAM is, more and more, being sidelined. So, I would not expect this optimization to be propagated to the Ascii case, even though it would be just as useful there.

You did not mention whether the pcode-only cases were both large or both small.

The performance difference between the two will usually be imperceptible.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: why bigger index of ascii than utf8?
1643
March 23, 2013 06:33PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.