MySQL Forums
Forum List  »  Partitioning

Re: 500 mil. rows table partitioning
Posted by: Misha Krassovski
Date: July 21, 2011 09:40AM

for the reference:
 
SELECT  'US-IB2' AS site, t0.time,
        t0.value AS APAR_0,
        t1.value AS APAR_fill_0,
        t2.value AS APARpct_0,
        t3.value AS APARpct_fill_0,
        t4.value AS FH2O_0,
        t5.value AS FH2O_fill_0
    FROM L2data t0
    INNER JOIN  L2data t1 ON t0.time = t1.time
    INNER JOIN  L2data t2 ON t0.time = t2.time
    INNER JOIN  L2data t3 ON t0.time = t3.time
    INNER JOIN  L2data t4 ON t0.time = t4.time
    INNER JOIN  L2data t5 ON t0.time = t5.time
WHERE t0.siteid = 47  AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3  AND t0.offsetid = 3 AND t0.flag = 0
  AND t1.siteid = 47  AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1  AND t1.offsetid = 3 AND t1.flag = 0
  AND t2.siteid = 47  AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3  AND t2.offsetid = 3 AND t2.flag = 0
  AND t3.siteid = 47  AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1  AND t3.offsetid = 3 AND t3.flag = 0
  AND t4.siteid = 47  AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3  AND t4.offsetid = 3 AND t4.flag = 0
  AND t5.siteid = 47  AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1  AND t5.offsetid = 3 AND t5.flag = 0
ORDER BY t0.time

>Are there really about 591679 rows matching
>t0.time = t1.time AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0 ??
>I would expect only 1. If it is only one, then the lookup will be quite efficient. At worst it will be about 1 disk hit.
It can't be 1 because it is measured many times (every 30 min), but I don't know why it shows 591679. If I run:
SELECT count(*) FROM L2data as t1 WHERE t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3;
it gives 88742. And total number of rows for siteid=47 is 5413262.


>> 99% of my queries extract data for only one site
>If you do not do partitioning, then this index has a similar effect (of localizing the disk blocks being accessed):
>KEY (`siteid`,`time`, `variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
>(I moved sited id to the start)
>I see repeat and flag being always zero in your example. Is this typical? If so, then rearranging to this might help further localize the data:
>KEY (`siteid`,`repeat`,`flag`, `time`, `variableid`,`extensionid`,`offsetid`,`value`)
Actually this gave me an idea to ask our specialists more about flags, and I've been told that for this particular type of data flag is not going to be used (it was intended to). So I deleted that column.
Repeat is in use by many sites and it just happened to be 0 in the query above.


>How many rows for the popular sideid? Perhaps about 591679?
There is no such thing as a popular site, all of them are equal. The average amount of rows is 4,183,528, min is 147,696 and max is 16,724,256.

>Is the output 591679 rows long? If not, please explain. (Maybe some times don't have all the requested values of repeat, extensionid, etc.)
No it isn't a long one. According to the numbers I put above it is small output.

This is what I have now after I got rid off 'flag' and set indexes like (`siteid`,`time`, `variableid`,`repeat`,`extensionid`,`offsetid`, `value`)

SHOW CREATE TABLE L2data;
  Table: L2dataCreate Table: CREATE TABLE `L2data`
 (  `extensionid` smallint(6) NOT NULL,
 `offsetid` int(11) NOT NULL,
 `repeat` tinyint(4) NOT NULL,
 `siteid` int(11) NOT NULL,
  `time` decimal(10,6) NOT NULL,
  `value` float NOT NULL,
  `variableid` smallint(6) NOT NULL,
  KEY `indx` (`siteid`,`time`,`variableid`,`repeat`,`extensionid`,`offsetid`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.03 sec)

SHOW TABLE STATUS LIKE 'L2data'
*************************** 1. row ***************************
           Name: L2data
         Engine: MyISAM
        Version: 10
     Row_format: Fixed
           Rows: 414169342
 Avg_row_length: 23
    Data_length: 9525894866
Max_data_length: 6473924464345087
   Index_length: 15322727424
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-07-19 14:40:31
    Update_time: 2011-07-19 14:45:48
     Check_time: 2011-07-19 16:32:49
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

SHOW VARIABLES LIKE '%buffer%';
*************************** 1. row ***************************
Variable_name: bulk_insert_buffer_size
        Value: 8388608
*************************** 2. row ***************************
Variable_name: innodb_buffer_pool_size
        Value: 8388608
*************************** 3. row ***************************
Variable_name: innodb_log_buffer_size
        Value: 1048576
*************************** 4. row ***************************
Variable_name: join_buffer_size
        Value: 131072
*************************** 5. row ***************************
Variable_name: key_buffer_size
        Value: 1677721600
*************************** 6. row ***************************
Variable_name: myisam_sort_buffer_size
        Value: 8388608
*************************** 7. row ***************************
Variable_name: net_buffer_length
        Value: 16384
*************************** 8. row ***************************
Variable_name: preload_buffer_size
        Value: 32768
*************************** 9. row ***************************
Variable_name: read_buffer_size
        Value: 131072
*************************** 10. row ***************************
Variable_name: read_rnd_buffer_size
        Value: 262144
*************************** 11. row ***************************
Variable_name: sort_buffer_size
        Value: 2097144
*************************** 12. row ***************************
Variable_name: sql_buffer_result
        Value: OFF
12 rows in set (0.00 sec)

EXPLAIN SELECT 'US-IB2' AS site, t0.time,
        t0.value AS APAR_0,
        t1.value AS APAR_fill_0,
        t2.value AS APARpct_0,
        t3.value AS APARpct_fill_0,
        t4.value AS FH2O_0,
        t5.value AS FH2O_fill_0
FROM L2data t0
    INNER JOIN  L2data t1 ON t0.time = t1.time
    INNER JOIN  L2data t2 ON t0.time = t2.time
    INNER JOIN  L2data t3 ON t0.time = t3.time
    INNER JOIN  L2data t4 ON t0.time = t4.time
    INNER JOIN  L2data t5 ON t0.time = t5.time
WHERE t0.siteid = 47  AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3  AND t0.offsetid = 3
  AND t1.siteid = 47  AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1  AND t1.offsetid = 3
  AND t2.siteid = 47  AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3  AND t2.offsetid = 3
  AND t3.siteid = 47  AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1  AND t3.offsetid = 3
  AND t4.siteid = 47  AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3  AND t4.offsetid = 3
  AND t5.siteid = 47  AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1  AND t5.offsetid = 3
  ORDER BY t0.time \G;

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t0
         type: ref
possible_keys: indx
          key: indx
      key_len: 4
          ref: const
         rows: 5999565
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: const,Global.t0.time,const,const,const,const
         rows: 690291
        Extra: Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: const,Global.t0.time,const,const,const,const
         rows: 690291
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: t3
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: const,Global.t0.time,const,const,const,const
         rows: 690291
        Extra: Using index
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: t4
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: const,Global.t1.time,const,const,const,const
         rows: 690291
        Extra: Using where; Using index
*************************** 6. row ***************************
           id: 1
  select_type: SIMPLE
        table: t5
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: const,Global.t2.time,const,const,const,const
         rows: 690291
        Extra: Using where; Using index
6 rows in set (0.00 sec)
I don't understand where 5999565 comes from. If I run:
SELECT 'US-IB2' AS site, t0.time, t0.value AS APAR_0
FROM L2data t0
WHERE t0.siteid = 47  AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3  AND t0.offsetid = 3
ORDER BY t0.time;
it returns only 88742 rows.

I did some performance check as well. I picked site that have max amount of data (16,724,256) and run queries with different number of variables.
Here is what I got:
3 vars -> 11.62 sec
4 vars -> 13.52 sec
6 vars -> 15.17 sec
9 vars -> 22.50 sec
13 vars -> 16371 sec
It is faster for sites with smaller amount of data. For example the query above is against site with 5,431,262 rows, it used to be 11 min and now it is 4.48 seconds. It is very big improvement, but still not how it suppose to be.
Do you see what else can bump the speed up? Thank you very much for helping me!

Misha

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: 500 mil. rows table partitioning
1849
July 21, 2011 09:40AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.