How to improve query performance of a large table
Posted by:
Hong Yang
Date: September 05, 2013 11:32AM
There is a nightly batch that summarizes raw data and stores in a summary table. The raw_data table is partition by date, one partition per day based on the fct_ts (timestamp) column value.
The raw_data table is defined as follow:
device_key int(11)PK
component_key int(11) PK
fct_ts timestamp PK
fct_value double
The hourly_summary table contains the summary data for each hour which is also partitioned by date. Table is defined as follow:
device_key int(11) PK
component_key int(11) PK
date_key int(10) UN PK
fct_ts timestamp PK
fct_avg double
fct_min double
fct_max double
fct_count int(11)
Neither table has indexes besides the primary key. The each partition of raw_data table contains 500M rows which are convert into roughly 34M row in the hourly_summary table.
This batch process is accomplished by executing the following query which ran for 8 hours to process one day worth of data.
Is there optimization can be done to speed up the execution?
REPLACE INTO hourly_summary
( device_key, component_key, date_key, fct_ts, fct_avg, fct_min, fct_max, fct_count )
SELECT main.device_key, main.component_key, TO_DAYS(main.fct_ts), CONVERT(TRUNCATE(main.fct_ts,-4), DATETIME) AS tick,
AVG(main.fct_value) AS fct_avg,
MIN(main.fct_value) AS fct_min, MAX(main.fct_value) AS fct_max, COUNT(main.fct_value) AS fct_count
FROM raw_data main
WHERE fct_ts >= TIMESTAMP('2013-09-01 00:00:00')
AND fct_ts < TIMESTAMP('2013-09-02 00:00:00')
GROUP BY fct_ts, device_key, component_key, tick