Re: Query optimisation advice
Posted by:
Paul Feely
Date: January 11, 2015 05:50PM
Thanks
1 Index changed to a compound index
2 Year datatype changed
3 The combination of participant fields that are unique are sport_id_fk,participant_id,year, club and program (A participant could have been at more than 1 club in the same year)- This has been added a primary key
3 Age is not computed - Statisticians analyse the data first to correct invalid ages (The original data supplied by the clubs is messy, sports also have different rules eg: age at start of year, age at start of sport season)
5 Distinct values for year is dependent on sport
Some sports have supplied many years of data, some only have the most recent few years
This will grow over time as they supply new data for each year
Current years are from 2002 to 2013
6 Mysql Version 5.6.2
7 innodb_buffer_pool_size 134217728
RAM ~6GB
(Development server at the moment, live server may be different)
Testing those changes the following query took about ~17 - 20sec
SELECT lga_name,
lga_code,l
ga.postcode,
lga.ratio,
COUNT(DISTINCT s.participant_id),
COUNT(DISTINCT s.participant_id)*lga.ratio AS postcode_total,
COUNT(DISTINCT CASE WHEN s.`year` = 2010 THEN s.participant_id END)*lga.ratio as postcode_total_start_year,
COUNT(DISTINCT CASE WHEN s.`year` = 2013 THEN s.participant_id END)*lga.ratio as postcode_total_end_year
FROM ssa_participant AS s
JOIN abs_postcode_lga AS lga ON lga.postcode = s.postcode
WHERE s.sport_id_fk = 18
AND s.`year` >= 2010
AND s.`year` <= 2013
GROUP BY lga_name,lga_code,lga.postcode, lga.ratio
EXPLAIN
id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE s range PRIMARY,sport_id_fk PRIMARY 2 1164507 Using where; Using temporary; Using filesort
1 SIMPLE lga ref PRIMARY PRIMARY 14 sports_spatial.s.postcode 1
Ther query can be reduced to about 7-8 seconds if the DISTINCT options are removed, but it is then incorrect