Re: Fastest way to check for an image in a table
Posted by:
Rick James
Date: September 25, 2010 08:15PM
String searches are not slow.
Make a parallel table. OK, so you are already doing that. The table contains meta information about the image, including the hash.
The problem comes when the hash index on the meta table is too big to be cached in RAM. Hashes are 'random'. A check to see if the hash is already in the table will need to do a disk access. That is, with under, say, 50M images, your checks are essentially CPU-bound, and you can probably check more than 1000 per second. With a billion images, the checks will slow down to something like 100 per second.
MD5 is a 128-bit hash, good enough for anything short of identifying each distinct atom in the universe. The hex version could be put into a BINARY(32) field, not VARCHAR(32). Note I say BINARY since you don't need any collation. And you don't need VAR since it is a constant length. Or it could be put into BINARY(16). This would shrink the index size.
How many images do you have?
Subject
Written By
Posted
September 24, 2010 01:57PM
September 24, 2010 03:10PM
Re: Fastest way to check for an image in a table
September 25, 2010 08:15PM
September 26, 2010 04:00AM
September 26, 2010 09:17AM
September 26, 2010 10:47AM
September 26, 2010 01:05PM
September 26, 2010 05:27PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.