Re: 500 mil. rows table partitioning
> Variable_name: key_buffer_size
> Value: 8384512
Change that to be about 20% of available RAM. It's used for caching indexes -- you need this caching a lot.
> rows: 1183396
> Extra: Using where; Using index
Good and bad.
The good part is "Using index" -- this means that it got what it was looking for entirely from the index (as I expected). The bad part is really a puzzle -- It is estimating that there are 1.1M rows needed from this table. I was expecting "1".
(for reference)
SELECT 'US-IB2' AS site, t0.time,
t0.value AS APAR_0,
t1.value AS APAR_fill_0,
t2.value AS APARpct_0,
t3.value AS APARpct_fill_0,
t4.value AS FH2O_0,
t5.value AS FH2O_fill_0
FROM L2data t0
INNER JOIN L2data_part_286m t1 ON t0.time = t1.time
INNER JOIN L2data t2 ON t0.time = t2.time
INNER JOIN L2data t3 ON t0.time = t3.time
INNER JOIN L2data_part_286m t4 ON t0.time = t4.time
INNER JOIN L2data t5 ON t0.time = t5.time
WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3 AND t0.offsetid = 3 AND t0.flag = 0
AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0
AND t2.siteid = 47 AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3 AND t2.offsetid = 3 AND t2.flag = 0
AND t3.siteid = 47 AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1 AND t3.offsetid = 3 AND t3.flag = 0
AND t4.siteid = 47 AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3 AND t4.offsetid = 3 AND t4.flag = 0
AND t5.siteid = 47 AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1 AND t5.offsetid = 3 AND t5.flag = 0
ORDER BY t0.time \G
Hmmm...
> AND t0.flag = 0
> `flag` varchar(5) NOT NULL,
Different datatypes. I don't know if this matters. I got here by noticing
> KEY `indx` (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
> ref: Global.t0.time,const,const,const,const,const
See how the refs stop at offsetid and fail to include `flag`? I'm guessing it is throwing up its hands when trying to compare the number 0 to a VARCHAR.
Which should it be? always numeric? Or always strings? Either change the VARCHAR, or add quotes around the '0'.
> Time is represented in the form of decimal year, for example: 2010.234567
So, your precision is about 32 seconds. That is good enough for
> measurements that are performed every 30 min.
> CHARSET=utf8
> Collation: utf8_general_ci
What kind of values go into the only CHAR (flag)? Perhaps utf8 is overkill?
> it will have at least 16GB. How much would you recommend to put in?
It depends.
* You have (currently) table is "10.7GB and index size is 14.7GB"
* So, not everything could be cached in 16GB; 32GB would be iffy.
* If your "working set" is much smaller, then 8GB might suffice -- it depends a lot on how much jumping around you do in the data.
> No data will be dropped.
That means that (MyISAM or InnoDB) will be appending to the table. So, if you usual queries are "find data since xx", there will be a hot spot near the end of the table.
InnoDB really needs a PRIMARY KEY. What combination of these would be unique?
> (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
Note: Since the PK in InnoDB is stored with the data, it might be more like 15GB+0GB instead of 10.7+14.7. That is, InnoDB might be smaller.
I have not seen any advantage of PARTITION (yet).
* You are not deleting data (which is very efficient with DROP PARTITION)
* You seem to need only one index.
* The one index is reasonably well ordered. (Sometimes PARTITION helps you play tricks with indexes that are sub-optimal.)