MySQL Forums
Forum List  »  MyISAM

Re: Primary key performance int vs. char
Posted by: Jay Pipes
Date: July 27, 2005 12:16AM

It would be kind of pointless to talk specific numbers, because such numbers would be based on a specific architecture/MySQL version, etc.

However, what is clear is that an integer primary key will *always* be faster for sorts/SELECTs than an alphanumeric primary key that is more than 4 characters long (and the chances are that for any significant amount of rows, you'll need an alphanumeric key greater than 4 characters).

Why can I make such a bold statement?

Well, think about it this way. The speed at which MyISAM (and for that matter any other engine) can retrieve key data from its indexes depends primarily on the number of keys that can fit in a single 1024-byte MyISAM index block. MyISAM reads index keys by index block, loading all the keys into the key cache from a single block. The number of keys which can fit on a single index block depends entirely on the data size of the key column(s). For each index block, there is a small amount of space reserved by MyISAM to identify the index block in the b-tree organization. Other than that, it's all about packing in as many keys as possible, given the 1024-byte block size.

With a primary key of type INT, which is 4 bytes, it turns out that a single MyISAM index block can fit around 126-127 keys in it, depending on the number of index records. This number is based on the size of the index block minus the small space needed for the identification of the index block divided by the size of the key column plus a 4 byte record pointer (assuming a fixed format record width). Consequently, each time MyISAM's key cache reads in a single index block, it is gaining access to a possible 127 key values plus record pointers (or index block pointers if the index block is a non-leaf page of the b-tree).

Contrast this with a CHAR(10) primary key. The number of key values which can fit in a single index block drops substantially: to around 72.

Now, let's assume a range query, where the number of key values found by the range query is around 10K, out of say 1M total rows. To perform the operation on the INT primary key index, MyISAM would have to read a maximum of 79 index blocks into memory. By contrast, for the CHAR(10) primary key index, MyISAM would have to read a maximum of 139 index blocks, just to retrieve the same set of record pointers.

For an index seek, the difference is also substantial. Because there are fewer key values per index block in the CHAR(10) primary key index, the chances that MyISAM will need to request more index blocks in order to find a specific key is greater. This, of course, slows seeks down.

Hope this clarifies a few general points about integer versus natural alphanumeric keys...

Jay Pipes
Community Relations Manager, North America, MySQL Inc.

Got Cluster? http://www.mysql.com/cluster
Personal: http://jpipes.com

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Primary key performance int vs. char
26816
July 27, 2005 12:16AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.