MySQL Forums
Forum List  »  Optimizer & Parser

Re: Optimize sub query with large group by
Posted by: Alex K
Date: March 14, 2011 02:39AM

I am using the 'group by trick' to select only the first event for each object.
I left out the time column and the order by as they were not my main concern.
The main problem is that the group by takes around 5 minutes to complete when the event table contains around 10 million rows. This is only one of the queries required in generating a report and some of the others require a similar group by taking another 5 minutes.

Here is the query containing the order by which unfortunately means another sub query.
SELECT person_id, COUNT(id)
    FROM (SELECT id, person_id
             FROM (SELECT id, person_id
                      FROM events
                      ORDER BY time DESC
             ) AS a
             GROUP BY object_id
         ) AS subquery
    GROUP BY person_id;

And here is the create statement including the time column:
CREATE TABLE events(
    id INT NOT NULL AUTO_INCREMENT,
    object_id INT NOT NULL,
    person_id INT NOT NULL,
    time DATETIME NOT NULL,
    PRIMARY KEY(id),
    INDEX(object_id),
    INDEX(person_id),
    INDEX(time)
) ENGINE=InnoDB;

The order by in the subquery does not seem to slow the query down much which is why I omitted it from the query I posted earlier.

I am thinking a major change is needed to improve the speed measurably, and wonder if using joins instead of sub queries would be faster. Although I think using joins would be much slower though due to mysql doing a 10M x 10M cartesian product.

I appreciate your suggestions Rick but still think I need some conceptual change to improve the speed.

Options: ReplyQuote


Subject
Views
Written By
Posted
3925
March 10, 2011 07:44PM
Re: Optimize sub query with large group by
2204
March 14, 2011 02:39AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.