MySQL Forums
Forum List  »  Performance

performance problem with "NOT IN" subquery containing a join
Posted by: Michael Pardee
Date: May 14, 2010 04:45PM

This query is taking 50x longer than if I manually execute the queries separately:

SELECT cal_login FROM webcal_view_user WHERE cal_view_id=6 AND cal_login NOT IN (SELECT DISTINCT wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location ) WHERE wvu.cal_view_id=6 AND (we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ORDER BY we.pacrez_endtime );

This is for a computer reservations system, and the query determines a list of computers currently NOT reserved at a specific site. Because of the nature of the problem it is much easier to find out which machines ARE reserved and substract those from the list instead of directly querying which machines are NOT reserved. The problem is that the performance of the query using a "NOT IN" is 50x worse than I'd expect. I've spent a few hours researching this and I can't figure out how to rewrite it just using joins to make it any quicker, since there is already a join in the subquery. I could just do some of the logic programming outside of mysql but after I solve this problem I'd like to add even more conditions to the query so I'd rather keep the logic in mysql as much as possible for now.


This is the subquery, it tells us which machines ARE reserved:
SELECT DISTINCT wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location ) WHERE wvu.cal_view_id=6 AND (we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ORDER BY we.pacrez_endtime ;
+-------------+
| cal_login |
+-------------+
| _NUC_LSPW08 |
| _NUC_LSPW04 |
| _NUC_LSPW02 |
| _NUC_LSPW07 |
| _NUC_LSPW09 |
| _NUC_LSPW01 |
| _NUC_LSPW11 |
| _NUC_LSPW05 |
| _NUC_LSPW03 |
| _NUC_LSPW06 |
+-------------+
10 rows in set (0.09 sec)

And this is part of the outer query which tells us which machines are at that site:
SELECT cal_login FROM webcal_view_user WHERE cal_view_id=6 ;
+-------------+
| cal_login |
+-------------+
| _NUC_LSPW01 |
| _NUC_LSPW02 |
| _NUC_LSPW03 |
| _NUC_LSPW04 |
| _NUC_LSPW05 |
| _NUC_LSPW06 |
| _NUC_LSPW07 |
| _NUC_LSPW08 |
| _NUC_LSPW09 |
| _NUC_LSPW10 |
| _NUC_LSPW11 |
| _NUC_LSPW12 |
+-------------+
12 rows in set (0.01 sec)

But for some reason, when you combine the two to find out which machines are NOT reserved with a NOT IN subquery, the execution time gets 50x worse:
SELECT cal_login FROM webcal_view_user WHERE cal_view_id=6 AND cal_login NOT IN (SELECT DISTINCT wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location ) WHERE wvu.cal_view_id=6 AND (we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ORDER BY we.pacrez_endtime );
+-------------+
| cal_login |
+-------------+
| _NUC_LSPW10 |
| _NUC_LSPW12 |
+-------------+
2 rows in set (5.52 sec)

EXPLAIN SELECT cal_login FROM webcal_view_user WHERE cal_view_id=6 AND cal_login NOT IN (SELECT DISTINCT wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location ) WHERE wvu.cal_view_id=6 AND (we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ORDER BY we.pacrez_endtime );
+----+--------------------+------------------+--------+---------------+---------+---------+------------+-------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------+--------+---------------+---------+---------+------------+-------+------------------------------------------+
| 1 | PRIMARY | webcal_view_user | ref | PRIMARY | PRIMARY | 4 | const | 13 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | wvu | eq_ref | PRIMARY | PRIMARY | 31 | const,func | 1 | Using index; Using temporary |
| 2 | DEPENDENT SUBQUERY | we | ALL | NULL | NULL | NULL | NULL | 69313 | Using where; Distinct; Using join buffer |
+----+--------------------+------------------+--------+---------------+---------+---------+------------+-------+------------------------------------------+
3 rows in set (0.00 sec)

DESCRIBE webcal_view_user; describe webcal_entry;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| cal_view_id | int(11) | NO | PRI | NULL | |
| cal_login | varchar(25) | NO | PRI | NULL | |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| cal_id | int(11) | NO | PRI | NULL | |
| cal_group_id | int(11) | YES | | NULL | |
| cal_ext_for_id | int(11) | YES | | NULL | |
| cal_create_by | varchar(25) | NO | | NULL | |
| cal_date | int(11) | NO | | NULL | |
| cal_time | int(11) | YES | | NULL | |
| pacrez_starttime | datetime | YES | | NULL | |
| pacrez_endtime | datetime | YES | | NULL | |
| cal_mod_date | int(11) | YES | | NULL | |
| cal_mod_time | int(11) | YES | | NULL | |
| cal_duration | int(11) | NO | | NULL | |
| cal_due_date | int(11) | YES | | NULL | |
| cal_due_time | int(11) | YES | | NULL | |
| cal_priority | int(11) | YES | | 5 | |
| cal_type | char(1) | YES | | E | |
| cal_access | char(1) | YES | | P | |
| cal_name | varchar(80) | NO | | NULL | |
| cal_location | varchar(100) | YES | | NULL | |
| cal_url | varchar(100) | YES | | NULL | |
| cal_completed | int(11) | YES | | NULL | |
| cal_description | text | YES | | NULL | |
| pacrez_multiplier | float | YES | | 1 | |
+-------------------+--------------+------+-----+---------+-------+


I originally had the query written using just a join but that performance was just as bad:
select wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location AND ( we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ) WHERE ( wvu.cal_view_id= 6 AND we.pacrez_endtime IS NULL ) ORDER BY we.pacrez_endtime;
+-------------+
| cal_login |
+-------------+
| _NUC_LSPW10 |
| _NUC_LSPW12 |
+-------------+
9 rows in set (5.66 sec)

mysql> EXPLAIN select wvu.cal_login from webcal_view_user wvu LEFT JOIN webcal_entry we ON ( SUBSTR(wvu.cal_login,6)=we.cal_location AND ( we.pacrez_endtime > NOW() AND we.pacrez_starttime < NOW() + INTERVAL 60 MINUTE ) ) WHERE ( wvu.cal_view_id= 6 AND we.pacrez_endtime IS NULL ) ORDER BY we.pacrez_endtime;
+----+-------------+-------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
| 1 | SIMPLE | wvu | ref | PRIMARY | PRIMARY | 4 | const | 13 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | we | ALL | NULL | NULL | NULL | NULL | 69315 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+-------+----------------------------------------------+
2 rows in set (0.00 sec)


I'm not opposed to doing things in a completely different manner, I don't use mysql much and I know this could be a very silly question.

Options: ReplyQuote


Subject
Views
Written By
Posted
performance problem with "NOT IN" subquery containing a join
4756
May 14, 2010 04:45PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.