Re: Slow search on one table, many where, order by and limit
Posted by:
Rick James
Date: February 26, 2010 12:40PM
Is GENDER nullable? Recommend using NOT NULL wherever it makes sense.
GENDER seems like a poor field to have by itself. But, since the rest of the WHERE clauses are expressions, they may not be useful in the index.
COUNTRY_ID =254 -- What percentage of rows match this? Are the queries always restricting on GENDER? Perhaps
INDEX(COUNTRY_ID, GENDER, LAST_VISITED_DATE)
would further help the first query (and even the second).
RAND() is not well optimized; such is hard to do. It is effectively finding all the rows that match the WHERE clause, adding (for the query) a random number to each row, sorting that tmp table, then delivering the first 320.
Why 320? That seems might high. Will you also try to do LIMIT 320,320 to page through the entries? It won't work unless you use RAND(1234); that is, use some number to assure that you get the same random sequence between the two LIMIT calls. But, if you always use the same number, then it is not random. Can't win.
Here's another approach for RAND:
1. Make an INDEX that lets you find 320 ids.
2. Fetch the data.
If the data is bulky, and if you are I/O bound, this could speed things up.
Please provide
* SHOW CREATE TABLE tbl\G -- engine, indexes
* SHOW TABLE STATUS LIKE 'tbl'\G -- sizes
* SHOW VARIABLES LIKE '%buffer%'; -- cache size
and surround them with [ code ] and [ / code ]
How much RAM do you have?
Another RAND approach is to add an extra column with RAND() in it. And index that column. Then ORDER BY that column, instead of ORDER BY RAND(). This will get tricky, because the optimizer will probably avoid using the index.
Another... Is CARD_ID consecutive? That is, do you have few gaps?
1. $max = SELECT MAX(CARD_ID)
2. Calculate 1000 random numbers between 1 and $max
3. Add AND CARD_ID IN (list of 1000 rand nums)
This might work (or might not) -- the trick is for the 1000 to be enough to include 320 cases of MAN looking for WOMAN, etc.
Another optimization -- Is it the case that a query is always including "X looking for Y" (eg MAN looking for WOMAN)? Then that should be a separate ENUM with only 4(?) possible values. That field should be the first in all (?) secondary indexes.
Another... Building on the X looking for Y -- Use PARTITION to segregate the data based on that ENUM. (I don't expect more than 2x speed up for an otherwise-unoptimized RAND().)
If you go with the ENUM, and a request comes in for "MAN looking for either", turn that into a UNION.
Is LANGUAGES a SET or an INT on which you do boolean operations? If so, this is probably useless:
KEY `LANGUAGES_DATE` (`LANGUAGES`,`LAST_VISITED_DATE`)
I may have more comments after seeing SHOW TABLE STATUS, etc.