MySQL Forums
Forum List  »  Performance

Performance problem with outer join
Posted by: RT Lee
Date: July 11, 2014 07:53AM

I am hoping to get some advice on how to optimize the performance of this query I have with an outer join. First I will explain what I am trying to do and then I'll show the code and results. I am using mySQL 5.6.13-2+debphp.org~raring+2-log.

I have an Accounts table that has a list of all customer accounts. And I have a datausage table which keeps track of how much data each customer is using. A backend process running on multiple servers inserts records into the datausage table each day to keep track of how much usage occurred that day for each customer on that server.

The backend process works like this - if there is no activity on that server for an account on that day, no records are written for that account. If there is activity, one record is written with a "LogDate" of that day. This is happening on multiple servers. So collectively the datausage table winds up with no rows (no activity at all for that customer each day), one row (activity was only on one server for that day), or multiple rows (activity was on multiple servers for that day).

We need to run a report that lists ALL customers, along with their usage for a specific date range. Some customers may have no usage at all (nothing whatsoever in the datausage table). Some customers may have no usage at all for the current period (but usage in other periods).

Regardless of whether there is any usage or not (ever, or for the selected period) we need EVERY customer in the Accounts table to be listed in the report, even if they show no usage. Therefore it seems this required an outer join.

Here is the query I am using:
SELECT
Accounts.accountID as AccountID,
IFNULL(Accounts.name,Accounts.accountID) as AccountName,
AccountPlans.plantype as AccountType,
Accounts.status as AccountStatus,
date(Accounts.created_at) as Created,
sum(IFNULL(datausage.Core,0) + (IFNULL(datausage.CoreDeluxe,0) * 3)) as 'CoreData'
FROM `Accounts`
LEFT JOIN `datausage` on `Accounts`.`accountID` = `datausage`.`accountID`
LEFT JOIN `AccountPlans` on `AccountPlans`.`PlanID` = `Accounts`.`PlanID`
WHERE
(
(`datausage`.`LogDate` >= '2014-06-01' and `datausage`.`LogDate` < '2014-07-01')
or `datausage`.`LogDate` is null
)
GROUP BY Accounts.accountID
ORDER BY `AccountName` asc

The challenge is that this query takes about 2 seconds to run. However it only takes 0.3 seconds to run if the "or datausage.LogDate is NULL" is removed. However, it seems I must have that clause in there, because accounts with no usage are excluded from the result set if that does not appear, and that would be a problem since all accounts are needed regardless of if there is any usage.

This is the EXPLAIN output:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+---------------------------------------------------------+---------+---------+----------------------+------- +----------------------------------------------------+
| 1 | SIMPLE | Accounts | ALL | PRIMARY,accounts_planid_foreign,accounts_cardid_foreign | NULL | NULL | NULL | 57 | Using temporary; Using filesort |
| 1 | SIMPLE | datausage | ALL | NULL | NULL | NULL | NULL | 96805 | Using where; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | AccountPlans | eq_ref | PRIMARY | PRIMARY | 4 | mydb.Accounts.planID | 1 | NULL |

Here are my indexes on Accounts table:
TABLE NON_UNIQUE KEY_NAME SEQ_IN_INDEX COLUMN_NAME COLLATION CARDINALITY SUB_PART PACKED NULL INDEX_TYPE COMMENT INDEX_COMMENT
accounts 0 PRIMARY 1 accountID A 5 (null) (null) BTREE
accounts 1 accounts_planid_foreign 1 planID A 5 (null) (null) BTREE
accounts 1 acctname_id_ndx 1 name A 5 (null) (null) YES BTREE
accounts 1 acctname_id_ndx 2 accountID A 5 (null) (null) BTREE

Here are my indexes on datausage table:
TABLE NON_UNIQUE KEY_NAME SEQ_IN_INDEX COLUMN_NAME COLLATION CARDINALITY SUB_PART PACKED NULL INDEX_TYPE COMMENT INDEX_COMMENT
datausage 0 PRIMARY 1 UsageID A 16 (null) (null) BTREE
datausage 1 acctusage 1 AccountID A 16 (null) (null) YES BTREE
datausage 1 acctusage 2 LogDate A 16 (null) (null) BTREE

Here are my indexes from AccountPlans:
TABLE NON_UNIQUE KEY_NAME SEQ_IN_INDEX COLUMN_NAME COLLATION CARDINALITY SUB_PART PACKED NULL INDEX_TYPE COMMENT INDEX_COMMENT
accountplans 0 PRIMARY 1 planID A 2 (null) (null) BTREE
accountplans 1 acctplans_id_type_ndx 1 planID A 2 (null) (null) BTREE
accountplans 1 acctplans_id_type_ndx 2 plantype A 2 (null) (null) BTREE

Others from online help have suggested I try queries like the following. However ALL of these have the same query EXPLAIN plan and have the same performance problem (they all take about 2 seconds vs 0.1 seconds without the outer join).

SELECT a.accountID as AccountID, coalesce(a.name, a.accountID) as AccountName,
ap.plantype as AccountType, a.status as AccountStatus,
date(a.created_at) as Created,
sum(coalesce(du.Core, 0) + (coalesce(du.CoreDeluxe, 0) * 3)) as CoreData
FROM Accounts a LEFT JOIN
datausage du
on a.accountID = du.`accountID` AND
du.`LogDate` >= '2014-06-01' and du.`LogDate` < '2014-07-01'
LEFT JOIN
AccountPlans ap
on ap.`PlanID` = a.`PlanID`
GROUP BY a.accountID
ORDER BY AccountName asc ;


and a query like this:


SELECT a.accountID as AccountID, coalesce(a.name, a.accountID) as AccountName,
ap.plantype as AccountType, a.status as AccountStatus,
date(a.created_at) as Created,
sum(coalesce(du.Core, 0) + (coalesce(du.CoreDeluxe, 0) * 3)) as CoreData
FROM Accounts a LEFT JOIN
datausage du
on a.accountID = du.`accountID` AND
du.LogDate >= '2014-06-01' and du.LogDate < '2014-07-01'LEFT JOIN
AccountPlans ap
on ap.PlanID = a.PlanID
GROUP BY a.accountID
ORDER BY a.name, a.accountID ;

but again - all these queries, including mine, have the same exact EXPLAIN output and performance problem.

Here is the code to build the schema for testing to easily reproduce the query plan and issue. Note that this only has a few rows so you won't see the performance problem, but with 100,000 rows like I have in datausage the performance issue can easily be seen.

CREATE TABLE `Accounts` (
`accountID` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`status` int(11) NOT NULL,
`planID` int(10) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`accountID`),
KEY `accounts_planid_foreign` (`planID`),
KEY `acctname_id_ndx` (`name`,`accountID`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `datausage` (
`UsageID` int(11) NOT NULL AUTO_INCREMENT,
`Core` int(11) DEFAULT NULL,
`CoreDeluxe` int(11) DEFAULT NULL,
`AccountID` varchar(25) DEFAULT NULL,
`LogDate` date NOT NULL,
PRIMARY KEY (`UsageID`),
KEY `acctusage` (`AccountID`,`LogDate`)
) ENGINE=Innodb AUTO_INCREMENT=104303 DEFAULT CHARSET=latin1 ;

CREATE TABLE `AccountPlans` (
`planID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) COLLATE utf8_unicode_ci NOT NULL,
`params` text COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`plantype` varchar(25) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`planID`),
KEY `acctplans_id_type_ndx` (`planID`,`plantype`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

INSERT INTO Accounts (accountID, name, created_at, status, planID) values ('asdfad','abc company', now(), 1, 1);
INSERT INTO Accounts (accountID, name, created_at, status, planID) values ('uiadjf','xyz company', now(), 1, 1);
INSERT INTO Accounts (accountID, name, created_at, status, planID) values ('349ajf','test company', now(), 1, 1);
INSERT INTO Accounts (accountID, name, created_at, status, planID) values ('huf7sd','zeebar co', now(), 1, 1);
INSERT INTO Accounts (accountID, name, created_at, status, planID) values ('bah38c','testing 123', now(), 1, 1);

INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (3, 1, 'asdfad', '2014-06-20');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (23, 11, 'asdfad', '2014-06-21');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (14, 3, 'asdfad', '2014-06-22');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (22, 41, 'asdfad', '2014-06-23');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (8, 2, 'asdfad', '2014-06-24');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (1, 0, 'asdfad', '2014-06-25');

INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (32, 71, 'uiadjf', '2014-06-20');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (233, 81, 'uiadjf', '2014-06-21');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (52, 51, 'uiadjf', '2014-06-22');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (23, 21, 'uiadjf', '2014-06-23');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (80, 4, 'uiadjf', '2014-06-24');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (0, 20, 'uiadjf', '2014-06-25');

INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (9, 17, '349ajf', '2014-06-20');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (51, 61, '349ajf', '2014-06-21');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (7, 3, '349ajf', '2014-06-22');
INSERT INTO datausage (Core, CoreDeluxe, AccountID, LogDate) values (253, 18, '349ajf', '2014-06-23');

INSERT INTO AccountPlans (planID, name, params, created_at, plantype) values (1, 'Plan 1', 'abc', now(), 'typexyz');
INSERT INTO AccountPlans (planID, name, params, created_at, plantype) values (2, 'Plan 2', 'def', now(), 'typexyz');


I have been trying to solve this now for a full month and would GREATLY appreciate some help.

I have a SQL fiddle online that I can reference but I did not know if it was OK to post a link to it here.

Thank you in advance for your help!

Options: ReplyQuote


Subject
Views
Written By
Posted
Performance problem with outer join
1905
July 11, 2014 07:53AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.