MySQL :: Re: Optimize sub query with large group by

New Topic

Re: Optimize sub query with large group by

Posted by: Øystein Grøvlen
Date: March 18, 2011 10:58AM

Alex K Wrote:
-------------------------------------------------------
> Oystein your query works however it takes longer
> and still includes the group by object_id which
> seemed to be what was taking so long.

I feared that. If you look at the plan for my query, it is as follows (MySQL 5.5):

mysql> EXPLAIN SELECT person_id, COUNT(id) FROM events WHERE (object_id, time) IN (SELECT object_id, MAX(time) FROM events GROUP BY object_id) GROUP BY person_id;
+----+--------------------+--------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | PRIMARY | events | index | NULL | person_id | 4 | NULL | 8 | Using where |
| 2 | DEPENDENT SUBQUERY | events | index | NULL | object_id | 4 | NULL | 4 | |
+----+--------------------+--------+-------+---------------+-----------+---------+------+------+-------------+
2 rows in set (0.00 sec)

This means that it will scan the events table using the person_id
index and for each row (pers, obj, t) it will execute something like the
following query:

SELECT object_id, MAX(time) FROM events
GROUP BY object_id
HAVING object_id = obj AND max(time) = t;

For this query it will use the object_id index. Hence, it will scan
all rows with the same object_id, looking for the maximum time. In
other words, there will be a lot of partial scans of the object_id
index. I tried adding a (object_id, time) index since in theory it
should be possible to find max(time) faster that way, but this index
did not seem to be used.

We are working on improving the execution of such queries by first
materializing the subquery. Then, for each row of events a direct
lookup on the materialized table using a hash index would give the
result. (See http://forge.mysql.com/worklog/task.php?id=1110.)

>
> I have been testing this on a laptop and hadn't
> increased the buffer_pool_size much. I checked the
> IO and most of it was fetching from disk.
> Thanks Rick.
>
> I tested the same query on the actual server and
> it only took 4 sec compared to 120 sec. This is
> much faster but I'm not sure what I should be
> expecting the speed to be (could it be faster?).
>
> Still wonder if there is a better way to get the
> same result with a different query.

If you look at the query plan for your query:

mysql> EXPLAIN SELECT person_id, COUNT(id) FROM (SELECT id, person_id FROM (SELECT id, person_id,object_id FROM events ORDER BY time DESC ) AS a GROUP BY object_id ) AS subquery GROUP BY person_id;
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | Using temporary; Using filesort |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 8 | Using temporary; Using filesort |
| 3 | DERIVED | events | ALL | NULL | NULL | NULL | NULL | 8 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+---------------------------------+
3 rows in set (0.00 sec)

You see there is a lot of filesort going on. First, the events table
is sorted on time and stored in a temporary table. This table is then
sorted on object_id to be able to do aggregation, and the result is
stored in a second temporary table. Finally, this temporary table is
sorted to group on person_id.

What impacts the cost of sorting most is the sort_buffer_size. If
there is not enough room in the sort buffer for all rows to be sorted
at once, the data needs to be sorted in several passes and temporary
results needs to be written to disk. Hence, a too small
sort_buffer_size will mean a lot of extra I/O.

When tuning the sort_buffer size, the following command is useful:

SHOW STATUS LIKE "sort%";

The reported number of "sort_merge_passes" tells you how many passes
are needed to sort the data. Usually, the lower the better.
(Remember to do "FLUSH STATUS;" between queries to reset the
statistics.)

Øystein Grøvlen,
Senior Principal Software Engineer,
MySQL Group, Oracle,
Trondheim, Norway

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Optimize sub query with large group by

3854

Alex K

March 10, 2011 07:44PM

Re: Optimize sub query with large group by

2165

Rick James

March 13, 2011 03:52PM

Re: Optimize sub query with large group by

1870

Alex K

March 13, 2011 04:59PM

Re: Optimize sub query with large group by

2846

Rick James

March 13, 2011 11:32PM

Re: Optimize sub query with large group by

2157

Alex K

March 14, 2011 02:39AM

Re: Optimize sub query with large group by

1623

Rick James

March 14, 2011 06:31PM

Re: Optimize sub query with large group by

1510

Alex K

March 14, 2011 07:15PM

Re: Optimize sub query with large group by

1771

Øystein Grøvlen

March 15, 2011 04:32AM

Re: Optimize sub query with large group by

1579

Alex K

March 15, 2011 08:46PM

Re: Optimize sub query with large group by

2280

Øystein Grøvlen

March 16, 2011 02:02AM

Re: Optimize sub query with large group by

2235

Rick James

March 16, 2011 08:57AM

Re: Optimize sub query with large group by

1586

Alex K

March 17, 2011 12:40AM

Re: Optimize sub query with large group by

3579

Øystein Grøvlen

March 18, 2011 10:58AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.