MySQL Forums
Forum List  »  Performance

Re: Left Join between two large tables too slow!!
Posted by: Rick James
Date: July 02, 2013 03:04AM

1. Always explicitly state a PRIMARY KEY for InnoDB tables. If there isn't a 'natural' key, then add an AUTO_INCREMENT INT.

2. Learn about "compound" indexes. In particular, the subqueries beg for
INDEX(create_date_time,site,exp)
There are limitations on how wide an index can be. At which point, I will ask if (255) is needed for those fields. And if utf8 is needed (as opposed to ascii -- 3:1 ratio of max size).

3. (This is the real problem.)
select  *
    from  
      ( SELECT  create_date_time,site,exp, sum(total) as imp_total
            from  Table2
            group by  create_date_time,site,exp) as imp
    inner join  
      ( SELECT  create_date_time,site,exp, sum(total) as req_total
            from  Table1
            group by  create_date_time,site,exp) as pr on imp.site=pr.site
      and  imp.create_date_time = pr.create_date_time
      and  imp.exp=pr.exp;
Notice that it has the pattern:
FROM ( subquery1 )
JOIN ( subquery2 ) ON ...
There are no indexes on either temp table, nor is there any way to specify indexes. This leads to the JOIN doing full table scans to connect the two temp tables.

At worst, each subquery is 18K rows, so the JOIN would be doing 18K*18K = 324M operations, which will take time. At best, each subquery would shrink (because of GROUP BY) to only one row; but then you would not be asking the question.

There are 3 solutions; the first two are generic:

* Put one of the subqueries in an explicit temp table and add an index.
CREATE TEMPORARY TABLE foo (INDEX(site, create_date_time, exp)) subquery1;
SELECT ... FROM foo
JOIN ( subquery2) ON ...;

* Upgrade to 5.6, which is smart enough to dynamically build indexes on subqueries.

* Here is a specific one for your query:
     ( SELECT   create_date_time,site,exp,
                sum(total) as imp_total,
                0 as reg_total
            from  Table2
            group by  create_date_time,site,exp )
     UNION  ALL
     ( SELECT   create_date_time,site,exp,
                0 as imp_total,
                sum(total) as req_total
            from  Table1
            group by  create_date_time,site,exp )
    GROUP BY  site, create_date_time, exp;
(It will format the results differently.)

Sorry for nit picking your query, but I have another question... The two tables look "identical"; why not have a single table? Something like
CREATE TABLE Both (
`proid` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`imps` bigint(20) NULL,
`regs` bigint(20) NULL,
`create_date_time` datetime NOT NULL,
`site` varchar(50) NOT NULL,
`exp` varchar(50) NOT NULL,
PRIMARY KEY `pr_id` (`proid`),
INDEX(`create_date_time`, `site`, `exp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

To INSERT just an imps amount:
INSERT INTO Both (imps, create_date_time, site, exp) VALUES (...);

Then the SELECT is merely:
SELECT  create_date_time,site,exp,
        SUM(imps) as imp_total,
        SUM(regs) as req_total
    from  Both
    group by  create_date_time,site,exp
(Perhaps your data, etc, does not lend itself to doing this suggestion, but maybe it gives you a clue of something worth trying.)

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Left Join between two large tables too slow!!
18008
July 02, 2013 03:04AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.