Re: 500 mil. rows table partitioning
> no need to search through the whole table
But you are currently not going through the whole table.
Are there really about 591679 rows matching
t0.time = t1.time AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0
??
I would expect only 1. If it is only one, then the lookup will be quite efficient. At worst it will be about 1 disk hit.
> 420 tables. Isn't that too many?
No. But PARTITION might be cleaner. A million tables is very bad.
> 99% of my queries extract data for only one site
If you do not do partitioning, then this index has a similar effect (of localizing the disk blocks being accessed):
KEY (`siteid`,`time`, `variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
(I moved sited id to the start)
How many rows for the popular sideid? Perhaps about 591679?
I do not understand
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t0
type: index
possible_keys: indx
key: indx
key_len: 23
ref: NULL
rows: 1
Extra: Using where; Using index
It looks like it should scan through all the times for
t0.siteid = 47
AND t0.variableid = 1
AND t0.repeat = 0
AND t0.extensionid = 3
AND t0.offsetid = 3
AND t0.flag = 0
I would expect
key_len and rows to be the same as for the other tables.
I would expect the need for
1. scan 591679 rows for t0;
2. 5*591679 probes (Using index) into t1, etc.
That could take 12 minutes!
Is the output 591679 rows long? If not, please explain. (Maybe some times don't have all the requested values of repeat, extensionid, etc.)
I see repeat and flag being always zero in your example. Is this typical? If so, then rearranging to this might help further localize the data:
KEY (`siteid`,`repeat`,`flag`, `time`, `variableid`,`extensionid`,`offsetid`,`value`)
> key_buffer_size 1.6GB
> Index_length: 13GB
> 99% of my queries extract data for only one site
Putting those together, I would predict that nearly all the time, the desired index blocks will be in the key_buffer. Of course this assumes the index can be constructed to allow such. I feel that the ones we are trying can achieve that. So could PARTITIONing. So, I still have not found a performance reason for PARTITION.