MySQL Forums
Forum List  »  Data Warehouse

Re: Optimizing it even more
Posted by: Pepe Pardo
Date: May 31, 2011 10:09AM

Is this the real query?
Yes, most of the queries are more or less the same. Usually we run the query in a daily basis; so one query may be:
SELECT DISTINCT a.* FROM d_agents a INNER JOIN f_sessions s ON a.agent_id = s.agent_fk AND s.begin_timestamp >= '2011-05-30 00:00:00' AND s.begin_timestamp <= '2011-05-30 23:59:59' AND s.bot_fk IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32);
for yesterday, and it works pretty quick, but if you select a large period, it generates a huge tmp table in HDD(+400Gbytes) and it takes a long time to return results.
The problem is that sometimes, we have to run this kind of queries (1 month period, 6 month period or 2 years period)


How many rows does it return?
The query for yesterday returns 7563 rows in 0,2519s(5,6252s)


d_agents has about 1.2M rows; how many DISTINCT rows?
SELECT DISTINCT * FROM d.agents;
All the rows are distinct because it is a dimension table. We have to use the word DISTINCT because the INNER JOIN returns
repeted agents per session since many sessions can have the same agent.


This starts at the beginning of a month, as do the partitions. Is that always the case?
AND s.begin_timestamp >= '2010-01-01 00:00:00' 
AND s.begin_timestamp <= '2011-05-25 23:59:59'
Not, it isn't always the case as I explained before.

Please provide the results of this:
SELECT  COUNT(*),
        COUNT(DISTINCT s.agent_fk)
    FROM  f_sessions s ON a.agent_id = s.agent_fk
      AND  s.begin_timestamp >=          '2011-04-01'
      AND  s.begin_timestamp <  DATE_ADD('2011-04-01', INTERVAL 1 MONTH)
      AND  s.bot_fk IN (1,2,3,4,5,6,7,8,9,
               10,11,12,13,14,15,16,17,18,19,20,21,
               22,23,24,25,26,27,28,29,30,31,32);
If you use a.agent_id I have to use a INNER JOIN with d_agents a
So the query:
SELECT  COUNT(*),
        COUNT(DISTINCT s.agent_fk)
    FROM  d_agents a INNER JOIN f_sessions s ON a.agent_id = s.agent_fk
      AND  s.begin_timestamp >= '2011-04-01'
      AND  s.begin_timestamp <  DATE_ADD('2011-04-01', INTERVAL 1 MONTH)
      AND  s.bot_fk IN (1,2,3,4,5,6,7,8,9,
               10,11,12,13,14,15,16,17,18,19,20,21,
               22,23,24,25,26,27,28,29,30,31,32);
returns 2353488, 107387 in 0.0009s(12.6052s)


Kind Regards

Options: ReplyQuote


Subject
Views
Written By
Posted
6205
May 27, 2011 06:14AM
3037
May 28, 2011 10:49PM
2566
May 30, 2011 08:57AM
2854
May 30, 2011 07:15PM
Re: Optimizing it even more
2898
May 31, 2011 10:09AM
2861
May 31, 2011 08:53PM
2924
June 01, 2011 08:42AM
3099
June 01, 2011 09:15AM
2955
June 02, 2011 02:13AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.