MySQL Forums
Forum List  »  Newbie

Re: mysql search performance
Posted by: Rick James
Date: November 01, 2009 11:18AM

Even if INT is faster to search at the CPU level, it makes essentially no difference if you have to hit the disk. At that point, the size of the field is what matters (smaller key -> more keys per block -> fewer blocsk to read -> faster). MEDIUMINT is (I think) going to take 3 bytes (4 if NULLable) in the index; VARCHAR will take a variable amount --
* The number of actual bytes,
* minus any "prefix optimizations",
* plus 1-2 bytes for the length,
* plus (maybe, maybe not) overhead for utf8.

A CPU inefficiency with VARCHAR is the collation. It is wasteful to call a zip code utf8; none of the very complex collation stuff in utf8 is needed. Hence, I recommend (if you want a string, not a number):
CHAR(5) BINARY, or
BINARY(5)

CHAR(5) CHARACTER SET ASCII
still wastes times on case folding, which won't be needed. OTOH, In Canada and the UK (at least), there are letters in their 6-character postalcodes. Then you possibly do want
VARCHAR(6) CHARACTER SET ASCII
(This allows zip-5 also.)

Personally, I would plan ahead and use something like
VARBINARY(10)
This
* handles the cases I know of (including zip-9), including International cases for the future.
* avoids collation costs, but requires me to fold all input and queries to uppercase.
* should be addequately efficient.
* lets me settle on something and forget it. (Hence programmer efficiency)

As to whether VARCHAR(150) is "large" and "slow" --
* If the average length is much less than 150, it won't be that large.
* If there are long runs with similar prefixes (unlikely for 'address'), there is further less "largeness".
* For SELECTing one row by such an index -- the number of steps is proportional the logarithm of the number of rows and proportional to the cost of doing a compare. The BTree is usually under 4 levels deep; you need 100 times as many rows to increase the depth by 1. If adequately cached, there are no disk I/Os; for huge tables, the disk hit(s) become the most costly part of the SELECT.

Personally, if I need to index a VARCHAR(150), I simply do it.

Note: If you set the table to be utf8, that is merely setting the default charset for any new columns. Existing columns retain their settings, and they do not have to be the same as each other. So, making the zip 'ascii', while having the address 'utf8', is perfectly fine.

Options: ReplyQuote


Subject
Written By
Posted
October 29, 2009 08:09PM
October 30, 2009 03:00AM
November 01, 2009 12:57AM
Re: mysql search performance
November 01, 2009 11:18AM
October 30, 2009 08:10PM
October 31, 2009 09:22PM
October 31, 2009 10:55PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.