MySQL Forums
Forum List  »  Optimizer & Parser

Query optimisation advice
Posted by: Paul Feely
Date: January 08, 2015 11:02PM

I've having difficulty optimising a query
I have a table of 2.5 million participants with their particpant_id, postcode, age, gender etc
Participants can be included multiple times if they were at different clubs during the year
Each sport in the table can have between 400000 and 800000 rows (Not sure if worth separating?)

I need to convert postcode to LGA (Local Government Area), which is achieved by a table with weighted estimates of how much each postcode extends into each LGA

The join to the abs_postcode_lga table appears to be performing poorly.

The following query takes about 50 seconds


SELECT lga_name,lga_code,abs_postcode_lga.postcode,ratio,
COUNT(DISTINCT participant_id),
COUNT(DISTINCT participant_id)*ratio AS postcode_total,
COUNT(DISTINCT CASE WHEN `year` = 2010 THEN participant_id END)*ratio as postcode_total_start_year,
COUNT(DISTINCT CASE WHEN `year` = 2013 THEN participant_id END)*ratio as postcode_total_end_year
FROM ssa_participant
LEFT JOIN abs_postcode_lga ON abs_postcode_lga.postcode = ssa_participant.postcode
WHERE sport_id_fk = 18
AND `year` >= 2010
AND `year` <= 2013
AND abs_postcode_lga.postcode IS NOT NULL
GROUP BY lga_name,lga_code,abs_postcode_lga.postcode,ratio

EXPLAIN

1 SIMPLE abs_postcode_lga ALL PRIMARY 4049 Using where; Using filesort

1 SIMPLE ssa_participant ref year,sport_id_fk,postcode postcode 15 sports_spatial.abs_postcode_lga.postcode 195 Using where


I've tried using index hints and I've got it down to 11 seconds, but I thought in the past I had been able to get it to perform in 2-3 seconds

With index hints (~11-17 seconds)

SELECT lga_name,lga_code,abs_postcode_lga.postcode,ratio,
COUNT(DISTINCT participant_id),
COUNT(DISTINCT participant_id)*ratio AS postcode_total,
COUNT(DISTINCT CASE WHEN `year` = 2010 THEN participant_id END)*ratio as postcode_total_start_year,
COUNT(DISTINCT CASE WHEN `year` = 2013 THEN participant_id END)*ratio as postcode_total_end_year
FROM ssa_participant USE INDEX (`year`,`sport_id_fk`)
LEFT JOIN abs_postcode_lga ON abs_postcode_lga.postcode = ssa_participant.postcode
WHERE sport_id_fk = 18
AND `year` >= 2010
AND `year` <= 2013
AND abs_postcode_lga.postcode IS NOT NULL
GROUP BY lga_name,lga_code,abs_postcode_lga.postcode,ratio

EXPLAIN

1 SIMPLE ssa_participant ref year,sport_id_fk sport_id_fk 1 const 1202214 Using where; Using temporary; Using filesort

1 SIMPLE abs_postcode_lga ref PRIMARY PRIMARY 14 sports_spatial.ssa_participant.postcode 1 Using where



Table structures



Can anyone spot some efficiency gains?


Structure
abs_post_lga
CREATE TABLE `abs_postcode_lga` (
`postcode` varchar(4) NOT NULL,
`lga_code` varchar(5) CHARACTER SET latin1 NOT NULL,
`lga_name` varchar(100) CHARACTER SET latin1 NOT NULL,
`ratio` decimal(10,8) NOT NULL,
PRIMARY KEY (`postcode`,`lga_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

4049 rows

ssa_participant

CREATE TABLE `ssa_participant` (
`sport_id_fk` tinyint(4) NOT NULL,
`year` varchar(4) NOT NULL DEFAULT '',
`participant_id` varchar(15) NOT NULL DEFAULT '',
`date_of_birth` date DEFAULT NULL,
`age` tinyint(4) DEFAULT NULL,
`sex` varchar(6) DEFAULT NULL,
`postcode` varchar(4) DEFAULT NULL,
`club` varchar(70) DEFAULT '',
`program` varchar(70) DEFAULT '',
KEY `year` (`year`) USING BTREE,
KEY `participant_id` (`participant_id`) USING BTREE,
KEY `sport_id_fk` (`sport_id_fk`) USING BTREE,
KEY `postcode` (`postcode`) USING BTREE,
KEY `club` (`club`) USING BTREE,
KEY `program` (`program`) USING BTREE,
KEY `age` (`age`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

~2.5 million rows

Options: ReplyQuote


Subject
Views
Written By
Posted
Query optimisation advice
2341
January 08, 2015 11:02PM
1211
January 10, 2015 01:22AM
1130
January 11, 2015 05:50PM
1204
January 12, 2015 10:35PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.