Performances of Time series partitioned by range, testing subpartition by KEY
Posted by:
Larry Brid
Date: December 01, 2017 10:24AM
I have some time series archived in different tables according to their type (double, long, ...).
Tables are partitioned by range on the timestamp (montlhy)
The biggest one (double_meas) is around 4 billion rows, 200 GB, with 2 years of data and around 4K different meas_id.
CREATE TABLE double_meas
(
meas_id INT UNSIGNED NOT NULL,
meas_time DATETIME(6) NOT NULL,
meas_value DOUBLE DEFAULT NULL,
meas_err_id INT UNSIGNED NULL DEFAULT NULL,
PRIMARY KEY(meas_id, meas_time),
INDEX(meas_err_id)
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(meas_time) (
PARTITION p000 VALUES LESS THAN ('2016-01-01'),
PARTITION p2016_01 VALUES LESS THAN ('2016-02-01'),
PARTITION p2016_02 VALUES LESS THAN ('2016-03-01'),
...
PARTITION p2017_12 VALUES LESS THAN ('2018-01-01'),
PARTITION p2018_01 VALUES LESS THAN ('2018-02-01'),
PARTITION future VALUES LESS THAN MAXVALUE
);
The second biggest table (long_meas) is around 400 million rows, 20 GB with 2 years of data and around 400 different meas_id
CREATE TABLE long_meas
(
meas_id INT UNSIGNED NOT NULL,
meas_time DATETIME(6) NOT NULL,
meas_value INT DEFAULT NULL,
meas_err_id INT UNSIGNED NULL DEFAULT NULL,
PRIMARY KEY(meas_id, meas_time),
INDEX(meas_err_id)
) ENGINE=InnoDB
PARTITION BY RANGE COLUMNS(meas_time) (
PARTITION p000 VALUES LESS THAN ('2016-01-01'),
PARTITION p2016_01 VALUES LESS THAN ('2016-02-01'),
PARTITION p2016_02 VALUES LESS THAN ('2016-03-01'),
...
PARTITION p2017_12 VALUES LESS THAN ('2018-01-01'),
PARTITION p2018_01 VALUES LESS THAN ('2018-02-01'),
PARTITION future VALUES LESS THAN MAXVALUE
);
Possible queries are:
SELECT meas_time, meas_value FROM xxx_meas WHERE meas_id=yyy AND meas_time BETWEEN zzz AND kkk;
SELECT m.meas_time, m.meas_value, e.meas_error FROM xxx_meas m LEFT JOIN meas_err e on m.meas_err_id=e.meas_err_id WHERE m.meas_id=yyy AND m.meas_time BETWEEN zzz AND kkk;
I am doing some benchmarks measuring select time twice after MySQL restart with SQL_NO_CACHE and in a test environment with 24 GB RAM and no insert during tests, and I noticed:
1) First select after restart is a little bit slower in the smaller table, up to 4x slower in the biggest table than second select
2) Second select after restart is pretty much the same between biggest and smaller table
3) Dropping the biggest table both first and second select after restart on smaller table become 2x faster
4) adding "SUBPARTITON BY KEY(meas_id) SUBPARTITONS 10 (..." on the biggest table improves import performances (2x faster) but no difference in select performances
Are these behaviors expected?
I am not sure if adding subpartitioning on the biggest table to at least reduce partition size (up to 20 GB now) and maybe increas insert performances.
Or maybe it would be better to change the partitiong range from montlhy to weekly or less instead?
What is the reason of 3) ?