MySQL Forums
Forum List  »  General

Re: How should I make text column search faster?
Posted by: Rick James
Date: September 15, 2011 09:17AM

Depending on the Engine you are using, you may be able to simply add
INDEX(address)

As for adding the hash... Yes, that would work. MD5() should suffice. Use BINARY, not VARCHAR, since you don't need any case folding, etc.

For all practical purposes, MD5() (128 bits) or Sha-1 (160 bits) can be considered to have no false hits. A 32-bit hash would have a lot of duplicates over 29M rows. I would not consider any hash shorter than 64 bits.

The binary format of MD5() could be inserted via UNHEX('...') into BINARY(16). The hex version could go into BINARY(32). When using Perl, I prefer the base-64 version, and BINARY(22); this is no readily available in PHP.

A hash is very random. So, if the index on the hash is bigger than you can cache in RAM, you will have a lot of disk I/O. It will probably be bigger than a gigabyte.

See
http://mysql.rjweb.org/doc.php/memory
for tuning MySQL to effectively use your RAM.

A hash will work for _only_ and _exact_ match. Is that OK?

INDEX(address), due to the average length being 70, would be only about 3GB. (20GB by December)

After growing by 6x, you should plan on either type of index being a disk hit essentially every time you do a lookup. This will limit you to about 100 lookups per second; is that OK?

Since either the hash or the direct index will be too big to be cached, why not keep it simple and not do the hash.

There is no such thing as a 'virtual index', nor 'materialized view'. MySQL is simpleminded; if you want to add the hash; you have to do all the work.

On the other hand, mostly MySQL uses BTrees for indexing. In a couple of cases it actually implements Hash indexing. One is MEMORY -- that that requires the entire table -- data and index -- to live in RAM, and there is no persistence. Not an option. The other is NDB Cluster, which involves multiple machines, and requires the entire index to be in the collective RAM of the 'cluster' of servers. You need a big budget to take that route.

You want the "search to be fast". What search? Exact match? The Hash and the plain index will each be approximately 1 disk hit, therefore about 10ms (plus other overhead). FULLTEXT search? That's another discussion. Find the street, but leave off the house number? The table does not seem to be structured for that. Etc.

(This discussion pretty much applies for 29M rows, or 6*29M, or a billion rows.)

InnoDB or MyISAM? Some of the numbers above are 'vague' because of significant differences between the Engines. SHOW TABLE STATUS.

Options: ReplyQuote


Subject
Written By
Posted
Re: How should I make text column search faster?
September 15, 2011 09:17AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.