Re: 500 mil. rows table partitioning
for the reference:
SELECT 'US-IB2' AS site, t0.time,
t0.value AS APAR_0,
t1.value AS APAR_fill_0,
t2.value AS APARpct_0,
t3.value AS APARpct_fill_0,
t4.value AS FH2O_0,
t5.value AS FH2O_fill_0
FROM L2data t0
INNER JOIN L2data t1 ON t0.time = t1.time
INNER JOIN L2data t2 ON t0.time = t2.time
INNER JOIN L2data t3 ON t0.time = t3.time
INNER JOIN L2data t4 ON t0.time = t4.time
INNER JOIN L2data t5 ON t0.time = t5.time
WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3 AND t0.offsetid = 3 AND t0.flag = 0
AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0
AND t2.siteid = 47 AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3 AND t2.offsetid = 3 AND t2.flag = 0
AND t3.siteid = 47 AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1 AND t3.offsetid = 3 AND t3.flag = 0
AND t4.siteid = 47 AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3 AND t4.offsetid = 3 AND t4.flag = 0
AND t5.siteid = 47 AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1 AND t5.offsetid = 3 AND t5.flag = 0
ORDER BY t0.time
>Are there really about 591679 rows matching
>t0.time = t1.time AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3 AND t1.flag = 0 ??
>I would expect only 1. If it is only one, then the lookup will be quite efficient. At worst it will be about 1 disk hit.
It can't be 1 because it is measured many times (every 30 min), but I don't know why it shows 591679. If I run:
SELECT count(*) FROM L2data as t1 WHERE t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3;
it gives 88742. And total number of rows for siteid=47 is 5413262.
>> 99% of my queries extract data for only one site
>If you do not do partitioning, then this index has a similar effect (of localizing the disk blocks being accessed):
>KEY (`siteid`,`time`, `variableid`,`repeat`,`extensionid`,`offsetid`,`flag`,`value`)
>(I moved sited id to the start)
>I see repeat and flag being always zero in your example. Is this typical? If so, then rearranging to this might help further localize the data:
>KEY (`siteid`,`repeat`,`flag`, `time`, `variableid`,`extensionid`,`offsetid`,`value`)
Actually this gave me an idea to ask our specialists more about flags, and I've been told that for this particular type of data flag is not going to be used (it was intended to). So I deleted that column.
Repeat is in use by many sites and it just happened to be 0 in the query above.
>How many rows for the popular sideid? Perhaps about 591679?
There is no such thing as a popular site, all of them are equal. The average amount of rows is 4,183,528, min is 147,696 and max is 16,724,256.
>Is the output 591679 rows long? If not, please explain. (Maybe some times don't have all the requested values of repeat, extensionid, etc.)
No it isn't a long one. According to the numbers I put above it is small output.
This is what I have now after I got rid off 'flag' and set indexes like (`siteid`,`time`, `variableid`,`repeat`,`extensionid`,`offsetid`, `value`)
SHOW CREATE TABLE L2data;
Table: L2dataCreate Table: CREATE TABLE `L2data`
( `extensionid` smallint(6) NOT NULL,
`offsetid` int(11) NOT NULL,
`repeat` tinyint(4) NOT NULL,
`siteid` int(11) NOT NULL,
`time` decimal(10,6) NOT NULL,
`value` float NOT NULL,
`variableid` smallint(6) NOT NULL,
KEY `indx` (`siteid`,`time`,`variableid`,`repeat`,`extensionid`,`offsetid`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.03 sec)
SHOW TABLE STATUS LIKE 'L2data'
*************************** 1. row ***************************
Name: L2data
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 414169342
Avg_row_length: 23
Data_length: 9525894866
Max_data_length: 6473924464345087
Index_length: 15322727424
Data_free: 0
Auto_increment: NULL
Create_time: 2011-07-19 14:40:31
Update_time: 2011-07-19 14:45:48
Check_time: 2011-07-19 16:32:49
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
SHOW VARIABLES LIKE '%buffer%';
*************************** 1. row ***************************
Variable_name: bulk_insert_buffer_size
Value: 8388608
*************************** 2. row ***************************
Variable_name: innodb_buffer_pool_size
Value: 8388608
*************************** 3. row ***************************
Variable_name: innodb_log_buffer_size
Value: 1048576
*************************** 4. row ***************************
Variable_name: join_buffer_size
Value: 131072
*************************** 5. row ***************************
Variable_name: key_buffer_size
Value: 1677721600
*************************** 6. row ***************************
Variable_name: myisam_sort_buffer_size
Value: 8388608
*************************** 7. row ***************************
Variable_name: net_buffer_length
Value: 16384
*************************** 8. row ***************************
Variable_name: preload_buffer_size
Value: 32768
*************************** 9. row ***************************
Variable_name: read_buffer_size
Value: 131072
*************************** 10. row ***************************
Variable_name: read_rnd_buffer_size
Value: 262144
*************************** 11. row ***************************
Variable_name: sort_buffer_size
Value: 2097144
*************************** 12. row ***************************
Variable_name: sql_buffer_result
Value: OFF
12 rows in set (0.00 sec)
EXPLAIN SELECT 'US-IB2' AS site, t0.time,
t0.value AS APAR_0,
t1.value AS APAR_fill_0,
t2.value AS APARpct_0,
t3.value AS APARpct_fill_0,
t4.value AS FH2O_0,
t5.value AS FH2O_fill_0
FROM L2data t0
INNER JOIN L2data t1 ON t0.time = t1.time
INNER JOIN L2data t2 ON t0.time = t2.time
INNER JOIN L2data t3 ON t0.time = t3.time
INNER JOIN L2data t4 ON t0.time = t4.time
INNER JOIN L2data t5 ON t0.time = t5.time
WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3 AND t0.offsetid = 3
AND t1.siteid = 47 AND t1.variableid = 1 AND t1.repeat = 0 AND t1.extensionid = 1 AND t1.offsetid = 3
AND t2.siteid = 47 AND t2.variableid = 6 AND t2.repeat = 0 AND t2.extensionid = 3 AND t2.offsetid = 3
AND t3.siteid = 47 AND t3.variableid = 6 AND t3.repeat = 0 AND t3.extensionid = 1 AND t3.offsetid = 3
AND t4.siteid = 47 AND t4.variableid = 5 AND t4.repeat = 0 AND t4.extensionid = 3 AND t4.offsetid = 3
AND t5.siteid = 47 AND t5.variableid = 5 AND t5.repeat = 0 AND t5.extensionid = 1 AND t5.offsetid = 3
ORDER BY t0.time \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t0
type: ref
possible_keys: indx
key: indx
key_len: 4
ref: const
rows: 5999565
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: indx
key: indx
key_len: 18
ref: const,Global.t0.time,const,const,const,const
rows: 690291
Extra: Using index
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: t2
type: ref
possible_keys: indx
key: indx
key_len: 18
ref: const,Global.t0.time,const,const,const,const
rows: 690291
Extra: Using index
*************************** 4. row ***************************
id: 1
select_type: SIMPLE
table: t3
type: ref
possible_keys: indx
key: indx
key_len: 18
ref: const,Global.t0.time,const,const,const,const
rows: 690291
Extra: Using index
*************************** 5. row ***************************
id: 1
select_type: SIMPLE
table: t4
type: ref
possible_keys: indx
key: indx
key_len: 18
ref: const,Global.t1.time,const,const,const,const
rows: 690291
Extra: Using where; Using index
*************************** 6. row ***************************
id: 1
select_type: SIMPLE
table: t5
type: ref
possible_keys: indx
key: indx
key_len: 18
ref: const,Global.t2.time,const,const,const,const
rows: 690291
Extra: Using where; Using index
6 rows in set (0.00 sec)
I don't understand where 5999565 comes from. If I run:
SELECT 'US-IB2' AS site, t0.time, t0.value AS APAR_0
FROM L2data t0
WHERE t0.siteid = 47 AND t0.variableid = 1 AND t0.repeat = 0 AND t0.extensionid = 3 AND t0.offsetid = 3
ORDER BY t0.time;
it returns only 88742 rows.
I did some performance check as well. I picked site that have max amount of data (16,724,256) and run queries with different number of variables.
Here is what I got:
3 vars -> 11.62 sec
4 vars -> 13.52 sec
6 vars -> 15.17 sec
9 vars -> 22.50 sec
13 vars -> 16371 sec
It is faster for sites with smaller amount of data. For example the query above is against site with 5,431,262 rows, it used to be 11 min and now it is 4.48 seconds. It is very big improvement, but still not how it suppose to be.
Do you see what else can bump the speed up? Thank you very much for helping me!
Misha