MySQL :: Re: Optimize sub query with large group by

New Topic

Re: Optimize sub query with large group by

Posted by: Rick James
Date: March 16, 2011 08:57AM

Another tack...
Let's "count the disk hits". How big is the table (in MB) -- SHOW TABLE STATUS. How long does it take to read that big a file?

Your task probably takes several passes over that much data.
1. read all the data to group by object_id
2. write out the grouped results
3. sort those results
4. read all that, in order to group by person_id
5. deliver the results

Possibly step 1 is I/O bound, or maybe it happens to be all cached. The other steps can probably be done mostly in RAM. How much RAM do you have? Even it it is not repeatedly hitting the disk, there is a lot of CPU crunching to do to handle a million rows.

Based on
"The aim of the query is to count the number of objects that were accessed per person, where only the most recent person to access the object gets counted.",
perhaps this is the tightest code:

select  person_id, count(*)
    from  
      ( SELECT  person_id
            from  events
            group by  object_id
      ) as subquery
    group by  person_id;

(Note: I dropped `id` as being unnecessary.)

The subquery will create a temp table with a million INTs, perhaps 5MB. This will either be a MEMORY table or MyISAM. In either case, it will probably be effectively RAM-resident. It will pick the _first_ person_id for each object_id. (This disagrees with the goal; more later.)

Then the outer query will either scan that tmp table and use a hashing technique, or it will sort the table and do a simple scan to get the final result.

Since you want the _last_ person, we need an extra step:

select  person_id, count(*)
    from  
      ( SELECT  person_id
            from 
              ( SELECT person_id, object_id FROM events ORDER BY id DESC ) x
            group by  object_id
      ) as subquery
    group by  person_id;

This assumes that ids were assigned in chronological order.
Alas, this means yet another pass over the data.

What is the value of innodb_buffer_pool_size? If it is too small, that could be part of the problem.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Optimize sub query with large group by

4082

Alex K

March 10, 2011 07:44PM

Re: Optimize sub query with large group by

2267

Rick James

March 13, 2011 03:52PM

Re: Optimize sub query with large group by

1983

Alex K

March 13, 2011 04:59PM

Re: Optimize sub query with large group by

2963

Rick James

March 13, 2011 11:32PM

Re: Optimize sub query with large group by

2263

Alex K

March 14, 2011 02:39AM

Re: Optimize sub query with large group by

1720

Rick James

March 14, 2011 06:31PM

Re: Optimize sub query with large group by

1618

Alex K

March 14, 2011 07:15PM

Re: Optimize sub query with large group by

1919

Øystein Grøvlen

March 15, 2011 04:32AM

Re: Optimize sub query with large group by

1679

Alex K

March 15, 2011 08:46PM

Re: Optimize sub query with large group by

2433

Øystein Grøvlen

March 16, 2011 02:02AM

Re: Optimize sub query with large group by

2343

Rick James

March 16, 2011 08:57AM

Re: Optimize sub query with large group by

1701

Alex K

March 17, 2011 12:40AM

Re: Optimize sub query with large group by

3692

Øystein Grøvlen

March 18, 2011 10:58AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.