Re: What will be the best Data-type to store md5\sha1 hashes in MySQL?
Posted by: Rick James
Date: February 05, 2015 06:57PM

Once the index is bigger than your cache, you will be in deep weeds, performance-wise. This blog discusses the performance problem; uuids have the same issue as md5 and sha1, but some uuids have a solution. md5 and sha1 do not have such a workaround.
http://mysql.rjweb.org/doc.php/uuid

Datatype: The best is to use BINARY(16) and BINARY(20) (or BINARY(36) if you concatenate them. The conversion to/from binary is done differently depending on what app language you are doing.

VAR is not needed, since they are fixed length. (VAR adds a 1- or 2-byte length overhead.)

Fixed length BINARY is the best, since it is the smallest. Smaller --> more cacheable --> less I/O --> faster.

I would argue for just MD5. Here are the odds for a 128-bit hash like MD5: If you have 9 trillion items, there is only one chance in 9 trillion that you have a false collision. Compare that to the odds of being hit by a meteor while winning the mega-lottery. And, since you do not have 9T files, your odds are even less than 1-in-9T.

How many rows? How big (GB) will the table be? How much RAM? How big is the cache? Spinning disks or SSDs? Which ENGINE? RAID striping and multiple threads?

Performance exercise... If you have 20 times as much data as can be cached, only 1/20th of the time will the desired row be in cache. That is, 95% of reads (keyed off an MD5) will have to hit the disk. Normal disks can do only about 100 reads/second, so that would be about 105 reads/second when including the cached ones. RAID striping and/or SSDs can increase that. More RAM can improve the "20" I was using.

Question... Are the 'items' somewhat chronologically ordered? Do you hit mostly "recent" items. If so, there may be some tricks to play.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: What will be the best Data-type to store md5\sha1 hashes in MySQL?
3924
February 05, 2015 06:57PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.