MySQL Forums
Forum List  »  Optimizer & Parser

Re: Optimize sub query with large group by
Posted by: Rick James
Date: March 16, 2011 08:57AM

Another tack...
Let's "count the disk hits". How big is the table (in MB) -- SHOW TABLE STATUS. How long does it take to read that big a file?

Your task probably takes several passes over that much data.
1. read all the data to group by object_id
2. write out the grouped results
3. sort those results
4. read all that, in order to group by person_id
5. deliver the results

Possibly step 1 is I/O bound, or maybe it happens to be all cached. The other steps can probably be done mostly in RAM. How much RAM do you have? Even it it is not repeatedly hitting the disk, there is a lot of CPU crunching to do to handle a million rows.

Based on
"The aim of the query is to count the number of objects that were accessed per person, where only the most recent person to access the object gets counted.",
perhaps this is the tightest code:
select  person_id, count(*)
    from  
      ( SELECT  person_id
            from  events
            group by  object_id
      ) as subquery
    group by  person_id;
(Note: I dropped `id` as being unnecessary.)

The subquery will create a temp table with a million INTs, perhaps 5MB. This will either be a MEMORY table or MyISAM. In either case, it will probably be effectively RAM-resident. It will pick the _first_ person_id for each object_id. (This disagrees with the goal; more later.)

Then the outer query will either scan that tmp table and use a hashing technique, or it will sort the table and do a simple scan to get the final result.

Since you want the _last_ person, we need an extra step:
select  person_id, count(*)
    from  
      ( SELECT  person_id
            from 
              ( SELECT person_id, object_id FROM events ORDER BY id DESC ) x
            group by  object_id
      ) as subquery
    group by  person_id;
This assumes that ids were assigned in chronological order.
Alas, this means yet another pass over the data.

What is the value of innodb_buffer_pool_size? If it is too small, that could be part of the problem.

Options: ReplyQuote


Subject
Views
Written By
Posted
3926
March 10, 2011 07:44PM
Re: Optimize sub query with large group by
2281
March 16, 2011 08:57AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.