Re: 500 mil. rows table partitioning
First what I did according to your recommendations:
>Change that to be about 20% of available RAM. It's used for caching indexes -- you need this caching a lot.
I set it up exactly 20% of what I have (8GB) - 1.6GB
SHOW VARIABLES LIKE '%buffer%'; -- cache size
*************************** 1. row ***************************
Variable_name: bulk_insert_buffer_size
Value: 8388608
*************************** 2. row ***************************
Variable_name: innodb_buffer_pool_size
Value: 8388608
*************************** 3. row ***************************
Variable_name: innodb_log_buffer_size
Value: 1048576
*************************** 4. row ***************************
Variable_name: join_buffer_size
Value: 131072
*************************** 5. row ***************************
Variable_name: key_buffer_size
Value: 1677721600
*************************** 6. row ***************************
Variable_name: myisam_sort_buffer_size
Value: 8388608
*************************** 7. row ***************************
Variable_name: net_buffer_length
Value: 16384
*************************** 8. row ***************************
Variable_name: preload_buffer_size
Value: 32768
*************************** 9. row ***************************
Variable_name: read_buffer_size
Value: 131072
*************************** 10. row ***************************
Variable_name: read_rnd_buffer_size
Value: 262144
*************************** 11. row ***************************
Variable_name: sort_buffer_size
Value: 2097144
*************************** 12. row ***************************
Variable_name: sql_buffer_result
Value: OFF
>See how the refs stop at offsetid and fail to include `flag`? I'm guessing it is throwing up its hands when trying to compare the number 0 to a VARCHAR.
>Which should it be? always numeric? Or always strings? Either change the VARCHAR, or add quotes around the '0'.
Clear mistake! I don't know how I missed that. Thank you very much for pointing that!!!! Flag can have only numeric value from 1 to 9, so I changed it to TINYINT(1)
>What kind of values go into the only CHAR (flag)? Perhaps utf8 is overkill?
Following previous correction I changed it to Latin1
Now it is what I have:
SHOW CREATE TABLE L2data\G -- engine, indexes;
*************************** 1. row ***************************
Table: L2data
Create Table: CREATE TABLE `L2data` (
`extensionid` smallint(6) NOT NULL,
`flag` tinyint(1) NOT NULL,
`offsetid` int(11) NOT NULL,
`repeat` tinyint(4) NOT NULL,
`siteid` int(11) NOT NULL,
`time` decimal(10,6) NOT NULL,
`value` float NOT NULL,
`variableid` smallint(6) NOT NULL,
KEY `indx` (`time`,`siteid`,`variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
SHOW TABLE STATUS LIKE 'L2data'\G -- sizes;
*************************** 1. row ***************************
Name: L2data
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 414169342
Avg_row_length: 24
Data_length: 9940064208
Max_data_length: 6755399441055743
Index_length: 13761551360
Data_free: 0
Auto_increment: NULL
Create_time: 2011-07-14 15:13:00
Update_time: 2011-07-14 15:17:53
Check_time: 2011-07-14 21:06:55
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
EXPLAIN SELECT 'US-IB2' AS site, t0.time, t0.value AS APAR_0, t1.value AS APAR_fill_0, t2.value AS APARpct_0, t3.value AS APARpct_fill_0, t4.value AS FH2O_0, t5.value AS FH2O_fill_0 FROM L2data t0 INNER JOIN L2data t1 ON t0.time = t1.time INNER JOIN L2data t2 ON t0.time = t2.time INNER JOIN L2data t3 ON t0.time = t3.time INNER JOIN L2data t4 ON t0.time = t4.time INNER JOIN L2data t5 ON t0.time = t5.time WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3 AND t0.offsetid = 3 AND t0.flag = 0 AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0 AND t2.siteid = 47 AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3 AND t2.offsetid = 3 AND t2.flag = 0 AND t3.siteid = 47 AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1 AND t3.offsetid = 3 AND t3.flag = 0 AND t4.siteid = 47 AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3 AND t4.offsetid = 3 AND t4.flag = 0 AND t5.siteid = 47 AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1 AND t5.offsetid = 3 AND t5.flag = 0 ORDER BY t0.time;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t0
type: index
possible_keys: indx
key: indx
key_len: 23
ref: NULL
rows: 1
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: indx
key: indx
key_len: 19
ref: Global.t0.time,const,const,const,const,const,const
rows: 591679
Extra: Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ref
possible_keys: indx
key: indx
key_len: 19
ref: Global.t0.time,const,const,const,const,const,const
rows: 591679
Extra: Using index
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: t3
type: ref
possible_keys: indx
key: indx
key_len: 19
ref: Global.t2.time,const,const,const,const,const,const
rows: 591679
Extra: Using where; Using index
*************************** 5. row ***************************
id: 1
select_type: SIMPLE
table: t4
type: ref
possible_keys: indx
key: indx
key_len: 19
ref: Global.t1.time,const,const,const,const,const,const
rows: 591679
Extra: Using where; Using index
*************************** 6. row ***************************
id: 1
select_type: SIMPLE
table: t5
type: ref
possible_keys: indx
key: indx
key_len: 19
ref: Global.t0.time,const,const,const,const,const,const
rows: 591679
Extra: Using index
6 rows in set (0.00 sec)
Unfortunately to run the query above takes about 12 min and it isn't the most complicated one. It may have up to 25-30 variables and 'time from to' condition.
My idea about partitioning was: since 99% of my queries extract data for only one site, then, if I partition table by siteid, only queried partition is going to be involved and no need to search through the whole table. Basically the same effect I can achieve if I create separate table for each site, but I have 140 sites and ultimately going to deal with 3 types of data -> 420 tables. Isn't that too many? I don't know if it will provide any gain in performance considering all overhead to deal with so many tables.
Thank you very much for helping me!
Misha