Re: Query optimisation advice
Posted by:
Rick James
Date: January 12, 2015 10:35PM
You now have this?
PRIMARY KEY (sport_id_fk, participant_id, year, club, program)
Changing it to this would improve _this_ query some:
PRIMARY KEY (sport_id_fk, year, participant_id, club, program)
Notice that with that reordered PK, it can scan a portion of the table, using all the rows. With your PK, it can only use sport_id_fk and has to skip over lots of rows looking for the desired year values.
How many rows have
s.sport_id_fk = 18
AND s.`year` >= 2010
AND s.`year` <= 2013
? Let's say it is 1 million out of the 2.5 million rows in the table.
It will still take several seconds to process all 1 million rows. What I have provided you is
* Not having to bounce between secondary key and primary key (by getting you to have a PK), and
* Ordering the PK so that the desired rows are all consecutive, hence no wasted effort skipping over other rows.
What I cannot provide you with
* A speed up for the JOIN -- not that it is too much overhead
* A speed up for the DISTINCT -- If you need it, you need it.
You already had eliminated I/O because innodb_buffer_pool_size is bigger than the tables.
A minor issue: Once you have a PK starting with sport_id_fk, there is no need for INDEX(sport_id_fk). (DROPping it will not affect this SELECT.)
But, here is another thought...
Is all the 'old' data static? That is, will anything change for the rows year<2015? Will new rows be added with year<2015? If not, you could summarize (that is, do the COUNT(DISTINCT...)) and store the results into a new "summary" table with
PRIMARY KEY (name, code, postcode, ratio, year)
Next January, add new rows for 2015. Etc.
Now the SELECT becomes more complicated since it is fetching from both the existing table (for the current year) plus the summary table (for older years), then UNIONing the results, etc.
But the SELECT will be much faster -- possibly 1 second.