MySQL Forums
Forum List  »  Partitioning

Re: 500 mil. rows table partitioning
Posted by: Misha Krassovski
Date: July 13, 2011 02:43PM

Rick, thank you for the reply!
Sorry I was on a trip and couldn't answer to you promptly.
I think, I need to provide a little bit more background information. It will address the following questions:
>Does that mean that the data could be a single table with about 83 columns? Looks like it would have `time` as the PRIMARY KEY. Is there some reason not to do that? It would be soooo much faster.
>Could one of those be derived from the other?
>Converting to a single table with 83+ columns will shrink the data significantly because you can get rid of variableid, etc. And you can make use TINYINT UNSIGNED (1 byte) instead of FLOAT (4 bytes) for some of the `value`s.

Data comes from ~140 different sites (siteid). Measurements on each site occurs every 30 min, but may span different time range. For example, we have sites that were active in the past and don't do measurements now. Sites may have gaps in their measurements for up to several month, and off cause sites can have measurements with the same time stamp. Each site measures its own set of variables, some of them measure only 5-8, some of them up to 20-25. Each variable is unique and can't be derived from others. Although there are only 83 core variables (variableid) it can be virtually countless number of them because they can be measured on different height (offsetid), measured several times for redundancy (repeat), might be corrected in several ways (exstensionid) and combinations of all these can be different from site to site. That are the reasons why it can't be organized in to a singe table with 'time' as the PRIMARY KEY and 83 columns.
No data will be dropped. For sites that are still active I will have and new measurements, and some times data for old sites has to be updated. Old values must be archived. I assume it will be easier just to copy old data to an archive DB, delete all data for a site and reload it with new/corrected values.

Now back to the rest of the questions. I used your advise and add index to the table:
CREATE INDEX indx ON L2data (time, siteid, variableid, `repeat`, extensionid, offsetid, flag, value);

So far :

>SHOW CREATE TABLE tbl\G -- engine, indexes;
*************************** 1. row ***************************
       Table: L2data
Create Table: CREATE TABLE `L2data` (
  `extensionid` smallint(6) NOT NULL,
  `flag` varchar(5) NOT NULL,
  `offsetid` int(11) NOT NULL,
  `repeat` tinyint(4) NOT NULL,
  `siteid` int(11) NOT NULL,
  `time` decimal(10,6) NOT NULL,
  `value` float NOT NULL,
  `variableid` smallint(6) NOT NULL,
  KEY `indx` (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

>SHOW TABLE STATUS LIKE 'tbl'\G -- sizes
************************** 1. row ***************************
           Name: L2data
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 414169342
 Avg_row_length: 27
    Data_length: 11510227132
Max_data_length: 281474976710655
   Index_length: 15804361728
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-07-13 09:05:29
    Update_time: 2011-07-13 09:13:01
     Check_time: 2011-07-13 15:59:28
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

>EXPLAIN SELECT
 
    EXPLAIN SELECT 'US-IB2' AS site, t0.time, t0.value AS APAR_0, t1.value AS APAR_fill_0, t2.value AS APARpct_0, t3.value AS APARpct_fill_0, t4.value AS FH2O_0, t5.value AS FH2O_fill_0  FROM L2data t0 INNER JOIN L2data_part_286m t1 ON t0.time = t1.time  INNER JOIN L2data t2 ON t0.time = t2.time  INNER JOIN L2data t3 ON t0.time = t3.time  INNER JOIN L2data_part_286m t4 ON t0.time = t4.time  INNER JOIN L2data t5 ON t0.time = t5.time  WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0  AND t0.extensionid = 3 AND t0.offsetid = 3 AND t0.flag = 0 AND t1.siteid = 47  AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1  AND t1.offsetid = 3 AND t1.flag = 0 AND t2.siteid = 47  AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3  AND t2.offsetid = 3 AND t2.flag = 0 AND t3.siteid = 47  AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1  AND t3.offsetid = 3 AND t3.flag = 0 AND t4.siteid = 47  AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3  AND t4.offsetid = 3 AND t4.flag = 0 AND t5.siteid = 47  AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1  AND t5.offsetid = 3 AND t5.flag = 0 ORDER BY t0.time \G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
         type: ALL
possible_keys: siteid_indx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5413262
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: t0
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: Global.t1.time,const,const,const,const,const
         rows: 1183396
        Extra: Using where; Using index
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: t2
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: Global.t0.time,const,const,const,const,const
         rows: 1183396
        Extra: Using where; Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: t3
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: Global.t0.time,const,const,const,const,const
         rows: 1183396
        Extra: Using where; Using index
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: t5
         type: ref
possible_keys: indx
          key: indx
      key_len: 18
          ref: Global.t1.time,const,const,const,const,const
         rows: 1183396
        Extra: Using where; Using index
*************************** 6. row ***************************
           id: 1
  select_type: SIMPLE
        table: t4
         type: ALL
possible_keys: siteid_indx
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 5413262
        Extra: Using where; Using join buffer
6 rows in set (0.00 sec)

>SHOW VARIABLES LIKE '%buffer%'\G -- cache size;
*************************** 1. row ***************************
Variable_name: bulk_insert_buffer_size
        Value: 8388608
*************************** 2. row ***************************
Variable_name: innodb_buffer_pool_size
        Value: 8388608
*************************** 3. row ***************************
Variable_name: innodb_log_buffer_size
        Value: 1048576
*************************** 4. row ***************************
Variable_name: join_buffer_size
        Value: 131072
*************************** 5. row ***************************
Variable_name: key_buffer_size
        Value: 8384512
*************************** 6. row ***************************
Variable_name: myisam_sort_buffer_size
        Value: 8388608
*************************** 7. row ***************************
Variable_name: net_buffer_length
        Value: 16384
*************************** 8. row ***************************
Variable_name: preload_buffer_size
        Value: 32768
*************************** 9. row ***************************
Variable_name: read_buffer_size
        Value: 131072
*************************** 10. row ***************************
Variable_name: read_rnd_buffer_size
        Value: 262144
*************************** 11. row ***************************
Variable_name: sort_buffer_size
        Value: 2097144
*************************** 12. row ***************************
Variable_name: sql_buffer_result
        Value: OFF
12 rows in set (0.00 sec)

>how much RAM do you have?
currently I'm using a test machine with just 8GB, but we are in the process of ordering a new server specially for this project and it will have at least 16GB. How much would you recommend to put in?

>How time is represented?
Time is represented in the form of decimal year, for example: 2010.234567

>How much disk do you have? How big is the table now? Allow 2x-4x the disk space after adding an index and/or changing to InnoDB.
Disk space is not a problem, I will have at least 2TB. Right now table is 10.7GB and index size is 14.7GB

Thank you very much,
Misha

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: 500 mil. rows table partitioning
1663
July 13, 2011 02:43PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.