Re: Adding limit clause takes query from 2 seconds to 42 minutes
Posted by:
Josh H
Date: February 11, 2013 11:17AM
Note that this turned out to be an optimizer bug, but here are responses in case you are interested :-)
I used a different date since '2012-05-15' was the start of data capture, and I'm concerned with performance ongoing.
mysql> select count(*) from player_record pr inner join date_curr dc on pr.player_id = dc.player_id where pr.insert_date < '2012-11-30';
+----------+
| count(*) |
+----------+
| 11834755 |
+----------+
1 row in set (2.42 sec)
mysql> explain select pr.player_id, max(pr.insert_date) as insert_date from player_record pr where pr.insert_date < '2012-11-30' group by pr.player_id;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | pr | range | insert_date | PRIMARY | 7 | NULL | 10 | Using where; Using index for group-by |
Doing your test with a limit of 20k takes 0.11 seconds, or 8.75 seconds with no limit. The problem isn't creating date prev with the dates, since that's all in the index anyway. It's fetching the actual data for those dates that takes time.
mysql> update date_prev inner join player_record pr using (player_id, insert_date) set date_prev.battles_count = pr.battles_count;
Query OK, 1308250 rows affected (9 min 27.41 sec)
Rows matched: 1308250 Changed: 1308250 Warnings: 0
I use a date_curr to limit the rows to those I actually care about (players active in the current time window). Once I have that date (which I get from an index scan) I can then fetch the actual data from the rows that matter. I did a fair amount of testing with this, and actually ended up with a different method that is pretty good using if statements in the select columns (since I need to scan all those dates anyway, might as well do it just once).
select if(insert_date between A and B,insert_date,null) as date_curr, if(insert_date < A,insert_date, null) as date_prev from player_record where insert_date < B having date_curr is not null;
Thanks.
Edited 1 time(s). Last edit at 02/11/2013 11:23AM by Josh H.