Re: Best primary key or index
Posted by:
Rick James
Date: December 14, 2011 12:46PM
I have a different vote...
INT vs CHAR usually does not matter. There are more important things to worry about.
First, the CPU time for handling INT, MEDIUMINT, CHAR(55) utf8, etc is insignificant compared to the other operations of the SQL queries. Disk I/O is much more significant -- if your table is big. So, let me harp on "bulk" issues...
CHAR should be used only for fixed length strings (country code, SSN, zipcode, md5, guid). VARCHAR should be used otherwise.
These days, CHARACTER SET should usually be utf8. Notable exceptions: GUID, UUID, MD5, zipcode, country_code, any hex string. That should be ascii or latin1, or even binary.
utf8 consumes 1-3 bytes per "character". CHAR(n) in utf8 _always_ takes 3n bytes. Comparisons, especially utf8_unicode_ci, require complex CPU code.
When the only index is the PRIMARY KEY, use whatever makes sense for that table. There are only disadvantages to artificially adding an INT.
In InnoDB, when there are secondary keys, then... Every secondary key includes a copy of the PK. A bulky PK gets magnified by this fact. So, it _may_ be better to have an artificial AUTO_INCREMENT id.
InnoDB tables should always (almost always) have an _explicit_ PRIMARY KEY.
MyISAM keys are implemented quite differently, so the InnoDB notes do not apply.
MEMORY always turns VARCHAR into CHAR. In the extreme, VARCHAR(255) utf8 _always_ takes 767 bytes in a MEMORY table or index.
If everything is cached in RAM, all my talk about bulk is rather inconsequential. But, if you are I/O bound, the shrinking bulk is quite important.
Some tables that don't really need an INT PK:
* Many-to-many mapping table: Two columns, foo and bar, two indexes: PRIMARY KEY(foo, bar), INDEX(bar, foo).
* Table of countries: PK is CHAR(2) Ascii. (May want secondary index of country_name.) Note: In this case, the PK is 2 bytes, which is the size of a SMALLINT.