MySQL Forums
Forum List  »  Partitioning

Re: 500 mil. rows table partitioning
Posted by: Rick James
Date: July 14, 2011 12:51AM

> Variable_name: key_buffer_size
> Value: 8384512
Change that to be about 20% of available RAM. It's used for caching indexes -- you need this caching a lot.

> rows: 1183396
> Extra: Using where; Using index
Good and bad.
The good part is "Using index" -- this means that it got what it was looking for entirely from the index (as I expected). The bad part is really a puzzle -- It is estimating that there are 1.1M rows needed from this table. I was expecting "1".

(for reference)
SELECT  'US-IB2' AS site, t0.time,
        t0.value AS APAR_0,
        t1.value AS APAR_fill_0,
        t2.value AS APARpct_0,
        t3.value AS APARpct_fill_0,
        t4.value AS FH2O_0,
        t5.value AS FH2O_fill_0
    FROM        L2data t0
    INNER JOIN  L2data_part_286m t1 ON t0.time = t1.time
    INNER JOIN  L2data t2           ON t0.time = t2.time
    INNER JOIN  L2data t3           ON t0.time = t3.time
    INNER JOIN  L2data_part_286m t4 ON t0.time = t4.time
    INNER JOIN  L2data t5           ON t0.time = t5.time
WHERE t0.siteid = 47  AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3  AND t0.offsetid = 3 AND t0.flag = 0
  AND t1.siteid = 47  AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1  AND t1.offsetid = 3 AND t1.flag = 0
  AND t2.siteid = 47  AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3  AND t2.offsetid = 3 AND t2.flag = 0
  AND t3.siteid = 47  AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1  AND t3.offsetid = 3 AND t3.flag = 0
  AND t4.siteid = 47  AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3  AND t4.offsetid = 3 AND t4.flag = 0
  AND t5.siteid = 47  AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1  AND t5.offsetid = 3 AND t5.flag = 0
    ORDER BY t0.time \G

Hmmm...
> AND t0.flag = 0
> `flag` varchar(5) NOT NULL,
Different datatypes. I don't know if this matters. I got here by noticing
> KEY `indx` (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
> ref: Global.t0.time,const,const,const,const,const
See how the refs stop at offsetid and fail to include `flag`? I'm guessing it is throwing up its hands when trying to compare the number 0 to a VARCHAR.
Which should it be? always numeric? Or always strings? Either change the VARCHAR, or add quotes around the '0'.

> Time is represented in the form of decimal year, for example: 2010.234567
So, your precision is about 32 seconds. That is good enough for
> measurements that are performed every 30 min.

> CHARSET=utf8
> Collation: utf8_general_ci
What kind of values go into the only CHAR (flag)? Perhaps utf8 is overkill?

> it will have at least 16GB. How much would you recommend to put in?
It depends.
* You have (currently) table is "10.7GB and index size is 14.7GB"
* So, not everything could be cached in 16GB; 32GB would be iffy.
* If your "working set" is much smaller, then 8GB might suffice -- it depends a lot on how much jumping around you do in the data.

> No data will be dropped.
That means that (MyISAM or InnoDB) will be appending to the table. So, if you usual queries are "find data since xx", there will be a hot spot near the end of the table.

InnoDB really needs a PRIMARY KEY. What combination of these would be unique?
> (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
Note: Since the PK in InnoDB is stored with the data, it might be more like 15GB+0GB instead of 10.7+14.7. That is, InnoDB might be smaller.

I have not seen any advantage of PARTITION (yet).
* You are not deleting data (which is very efficient with DROP PARTITION)
* You seem to need only one index.
* The one index is reasonably well ordered. (Sometimes PARTITION helps you play tricks with indexes that are sub-optimal.)

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: 500 mil. rows table partitioning
1874
July 14, 2011 12:51AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.