Re: Subquery with range uses filesort
Following is an actual set of queries that we run. This is currently
done by a PHP webpage. First the webpage does
create temporary table Temp ( primary key ( PlayerID ) ) engine=memory
select
PlayerID, PlayerName, PlayerPrimaryClub, PlayerCountry, PlayerMean,
PlayerStDev, PlayerLastEvent, PlayerLastPlayed
from Player
where PlayerID in (...);
id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,Player,ALL,PRIMARY,NULL,NULL,NULL,53985,19.23,"Using where"
The three dots are really a list of 10384 player IDs.
Then it does
select PlayerID from Temp;
Then it loops through the records. For each $PlayerID, it does
| select HistoryEvent, HistoryDate, HistoryFinalMean,
| HistoryFinalStDev from History
| where HistoryPlayer = $PlayerID and HistoryDate <= '2012-06-30'
| order by HistoryDate desc, HistoryDirector desc limit 0,1;
| id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
| 1,SIMPLE,History,range,PlayerDateDirector,PlayerDateDirector,7,NULL,40,100.00,"Using where"
| if none, then
|| delete from Temp where PlayerID = $PlayerID;
| else
|| update Temp
|| set PlayerMean = HistoryFinalMean,
|| PlayerLastEvent = HistoryEvent,
|| PlayerLastPlayed = HistoryDate,
|| PlayerStDev = HistoryFinalStDev
|| where PlayerID = $PlayerID;
Finally, it does
select
PlayerID, PlayerName, PlayerPrimaryClub, PlayerCountry, PlayerMean,
PlayerStDev, PlayerLastEvent, PlayerLastPlayed
from Temp
order by PlayerMean desc
id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,SIMPLE,Temp,ALL,NULL,NULL,NULL,NULL,10365,100.00,"Using filesort"
Here is a single query that does the same thing:
select
PlayerID, PlayerName, PlayerPrimaryClub, PlayerCountry,
HistoryEvent, HistoryDate, HistoryFinalMean, HistoryFinalStDev
from Player
join History on
HistoryPlayer = PlayerID
and HistoryEvent =
( select HistoryEvent
from History
where HistoryPlayer = PlayerID and HistoryDate <= '2012-06-30'
order by HistoryDate desc, HistoryDirector desc limit 0,1 )
where PlayerID in (...)
order by HistoryFinalMean desc
id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,PRIMARY,Player,ALL,PRIMARY,NULL,NULL,NULL,53985,19.23,"Using where; Using temporary; Using filesort"
1,PRIMARY,History,eq_ref,"PRIMARY,EventReportID,PlayerDateDirector,EventInitialMean,EventFinalMean",PRIMARY,8,"func,ratingscentral.Player.PlayerID",1,100.00,"Using where"
2,"DEPENDENT SUBQUERY",History,ref,PlayerDateDirector,PlayerDateDirector,4,ratingscentral.Player.PlayerID,8808,100.00,"Using where; Using filesort"
I ran this several times, both on the webserver and on my development
PC. On my development PC, I stopped and started MySQL in between runs.
The query returns 9297 rows.
Webserver
Duration / Fetch
6.911 sec / 0.281 sec
3.120 sec / 0.202 sec
3.151 sec / 0.250 sec
Development
Duration / Fetch
6.474 sec / 0.016 sec
6.474 sec / 0.016 sec
This next query does the same thing, but does the join differently:
select
PlayerID, PlayerName, PlayerPrimaryClub, PlayerCountry,
HistoryEvent, HistoryDate, HistoryFinalMean, HistoryFinalStDev
from Player
join History on
HistoryPlayer = PlayerID
and ( HistoryDate, HistoryDirector ) =
( select HistoryDate, HistoryDirector
from History
where HistoryPlayer = PlayerID and HistoryDate <= '2012-06-30'
order by HistoryDate desc, HistoryDirector desc limit 0,1 )
where PlayerID in (...)
order by HistoryFinalMean desc
id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,PRIMARY,History,ALL,PlayerDateDirector,NULL,NULL,NULL,880850,54.80,"Using where; Using filesort"
1,PRIMARY,Player,eq_ref,PRIMARY,PRIMARY,4,ratingscentral.History.HistoryPlayer,1,100.00,"Using where"
2,"DEPENDENT SUBQUERY",History,ref,PlayerDateDirector,PlayerDateDirector,4,ratingscentral.Player.PlayerID,8808,100.00,"Using where; Using index; Using filesort"
Webserver
Duration / Fetch
27.534 sec / 86.986 sec
27.425 sec / 87.064 sec
Development
Duration / Fetch
79.530 sec / 227.652 sec
80.528 sec / 229.461 sec
We don't really need the order-by HistoryFinalMean for this
application, so here is the first query without the order-by:
select
PlayerID, PlayerName, PlayerPrimaryClub, PlayerCountry,
HistoryEvent, HistoryDate, HistoryFinalMean, HistoryFinalStDev
from Player
join History on
HistoryPlayer = PlayerID
and HistoryEvent =
( select HistoryEvent
from History
where HistoryPlayer = PlayerID and HistoryDate <= '2012-06-30'
order by HistoryDate desc, HistoryDirector desc limit 0,1 )
where PlayerID in (...)
id,select_type,table,type,possible_keys,key,key_len,ref,rows,filtered,Extra
1,PRIMARY,Player,ALL,PRIMARY,NULL,NULL,NULL,54178,19.17,"Using where"
1,PRIMARY,History,eq_ref,"PRIMARY,EventReportID,PlayerDateDirector,EventInitialMean,EventFinalMean",PRIMARY,8,"func,davidmarcus_ratingscentral.Player.PlayerID",1,100.00,"Using where"
2,"DEPENDENT SUBQUERY",History,ref,PlayerDateDirector,PlayerDateDirector,4,davidmarcus_ratingscentral.Player.PlayerID,8900,100.00,"Using where; Using filesort"
Webserver
Duration / Fetch
0.764 sec / 2.387 sec
0.765 sec / 2.372 sec
Development
Duration / Fetch
1.045 sec / 5.413 sec
1.076 sec / 5.507 sec
The order-by seems to add 2 seconds to the duration on the webserver,
which seems like a lot. I don't understand why the fetch is so much.
I ran all queries using MySQL Workbench 6.0.7.11215. The webserver is
runnng MySQL 5.0.95 (although I expect them to upgrade since they were
running a newer version until they recently had to reinstall). My
development PC is running MySQL 5.5.33.
I created a PHP webpage that uses the same approach as the current
webpage (i.e., the temporary table) and writes out the result as text
that displays in the browser. Here are times with the order-by:
Webserver
2.70 sec
2.56 sec
2.71 sec
Development
5.77 sec
4.18 sec
4.21 sec
Removing the order-by from the webpage has an insignificant effect.
The webpage time is less than the duration plus fetch times of the
queries.