VARCHAR vs. TEXT - some performance numbers
A question I hear once in a while is "why not make all your textual columns TEXT instead of VARCHAR?" I've never heard a solid answer, so I decided to run a simple benchmark - the results are somewhat surprising.
I created two tables - identical except that one uses VARCHAR(255) and the other uses TEXT. In order to render the two indexes similar, I indexed the entire VARCHAR(255) column but only the first 255 characters of the TEXT column:
CREATE TABLE V (a varchar(255) NOT NULL, PRIMARY KEY (a)) ENGINE=MyISAM;
CREATE TABLE T (a text NOT NULL, PRIMARY KEY (a(255))) ENGINE=MyISAM;
I then created 200000 random strings from the alphabet [A-Z,a-z,0-9]. Each string was between 50 and 255 characters long. I loaded that set of strings into both tables (same data set in each table).
Datafile and index file sizes were nearly the same for both tables (the TEXT table was only a few bytes larger). However, queries against these two tables were remarkably different in terms of speed:
SELECT SQL_NO_CACHE count(*) from TABLE_NAME where a LIKE "n%";
I ran 16 queries like this against both tables, using random characters for n. Again, the same 16 queries were run against both tables. Queries against the TEXT table were always 3 times slower than those against the VARCHAR table (averages: 0.10 seconds for the VARCHAR table, 0.29 seconds for the TEXT table). The difference is 100% repeatable.
Anyone have any ideas why? The only note I can find is this one on MySQL's site: "Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened." (bottom of
http://dev.mysql.com/doc/refman/5.0/en/blob.html). Very vague, but offers a hint. The EXPLAIN output is slightly different, as well:
mysql> explain SELECT SQL_NO_CACHE count(*) from V where a LIKE "n%";
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | V | range | PRIMARY | PRIMARY | 257 | NULL | 5882 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.00 sec)
mysql> explain SELECT SQL_NO_CACHE count(*) from T where a LIKE "n%";
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | T | range | PRIMARY | PRIMARY | 257 | NULL | 5882 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
It looks like the index is being used for the VARCHAR table, but not for the TEXT table (in the Extra column). But then the key column shows that the PK is being used in both cases, so that's ambiguous.
Any thoughts?