MySQL Forums
Forum List  »  Optimizer & Parser

Liberating Dependent Subqueries
Posted by: Shlomo Swidler
Date: November 27, 2006 05:50PM

Ever wonder why MySQL takes a really long time to execute a query containing a dependent subquery, even when the subquery is obviously not really "dependent" on the outer query? I wrote a stored procedure "constify" to help you out with this problem. But first, let me explain.

I have an n-to-n mapping of books to tags.
Here are the books:
mysql> describe books;
+---------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+----------------+
| book_id | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(50) | NO | | | |
| author | varchar(50) | NO | | | |
+---------+-------------+------+-----+---------+----------------+

Here are the tags:
mysql> describe tags;
+----------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+----------------+
| tag_id | int(11) | NO | PRI | NULL | auto_increment |
| tag_text | varchar(40) | NO | UNI | | |
+----------+-------------+------+-----+---------+----------------+

And here are the mappings:
mysql> describe book_tags;
+---------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------+------+-----+---------+-------+
| book_id | int(11) | NO | | | |
| tag_id | int(11) | NO | | | |
+---------+---------+------+-----+---------+-------+

Some of my users have been abusing the system and adding tags that are forbidden by my user agreement (obscene language, that sort of thing). In order not to run afoul of local laws, I need to remove all the book-to-tag mappings for prohibited tags - otherwise these tags will be shown on my website.

Let's say one of the forbidden words is "bunk". Some of my users have created tags that contain this word ("product is bunk", "total bunk"... you get the idea). I would like to run a query as follows:

DELETE FROM book_tags WHERE tag_id IN ( select tag_id from tags where LOCATE('bunk',tag_text)>0 );

(Okay, I wouldn’t really do that but let’s pretend....)

This will work, but if I am Amazon.com and I have many millions of books in my table, and many millions of mappings in the book_tags table, it will take prohibitively long: the dependent subquery is evaluated for every row in the outer query, that is, for every row in book_tags the inner query will be evaluated again:

mysql> explain select * from users_tags where tag_id in ( select tag_id from tags where LOCATE('bunk', tag_text)>0 );
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+
| 1 | PRIMARY | book_tags | ALL | NULL | NULL | NULL | NULL | 49032352 | Using where |
| 2 | DEPENDENT SUBQUERY | tags | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using index; Using where |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+

Recomposing the query as a join gets me better performance, but not nearly good enough:

DELETE FROM book_tags bt USING book_tags bt, tags t WHERE bt.tag_id = t.tag_id AND LOCATE('bunk', t.tag_text)>0;

This is equivalent to the following SELECT, which has the performance as shown:
mysql> explain select * from book_tags bt, tags t WHERE bt.tag_id = t.tag_id AND LOCATE('bunk', t.tag_text)>0;
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| 1 | SIMPLE | bt | ALL | tag_id | NULL | NULL | NULL | 49032352 | |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | temp.bt.tag_id | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

This form would execute more quickly.

But we can do much better. The real issue is that the subquery here is not really dependent on any of the columns in the outer query. The subquery will return the same result every time it is evaluated. You know that. I know that. But MySQL does not know that.

What I want to be able to do is as follows:

DELETE FROM book_tags WHERE tag_id IN ( constify("SELECT tag_id FROM tags WHERE LOCATE('bunk', tag_text)>0") );

If I had this function "constify", it would evaluate the subquery and turn it into a constant expression (the list of tag_ids that containing “bunk”), a result such as "1,2,56,1000,30405,1010200,10203439". The resulting query would be even faster still. It would look like this:

DELETE FROM book_tags WHERE tag_id IN ( 1,2,56,1000,30405,1010200,10203439 );

And its performance:
mysql> EXPLAIN SELECT * FROM book_tags WHERE tag_id IN ( 1,2,56,1000,30405,1010200,10203439 );
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+
| 1 | SIMPLE | book_tags | range | tag_id | tag_id | 4 | NULL | 11433 | Using where |
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+

This is the performance I’m looking for. This "constify" function would help me get it.

Well, I don’t have a "constify" function (more on this below), but I do have a "constify" stored procedure, which lets me do as follows:

mysql> call constify("SELECT tag_id FROM tags WHERE LOCATE('bunk', tag_text)>0", @result);
mysql> set @query=CONCAT("DELETE FROM book_tags WHERE tag_id IN (", @result, ")");
mysql> PREPARE st FROM @query;
mysql> EXECUTE st;
mysql> DEALLOCATE PREPARE st;

[Note: above commands edited. Please see a follow-up post for improved syntax.]

Its syntax is a little messier than with a function, but it's the best we can do given MySQL's limitations (again, see below). Here’s the constify stored procedure:

CREATE PROCEDURE constify (IN query_ TEXT, OUT constStr TEXT)
MODIFIES SQL DATA
SQL SECURITY INVOKER
COMMENT 'Turns the results of the given query into a comma-separated str.'
BEGIN
-- Important notes:
-- The provided query must specify a single-column result
-- Each row in the result must be no longer than 50 characters
-- (to change this, change the declaration of oneItem below)

declare flagLastItem INT DEFAULT 0;
declare oneItem VARCHAR(50);
declare itemCursor cursor for select * from constifyResults;
declare continue handler for SQLSTATE '02000' SET flagLastItem = 1;

set @constifyResult = "";
set @sep = "";
set @query = query_;
set @createStmt = CONCAT("create temporary table constifyResults ", @query);

drop table if exists constifyResults;
prepare stmt from @createStmt;
execute stmt;
deallocate prepare stmt;

open itemCursor;
theLoop: repeat
fetch itemCursor into oneItem;
if not flagLastItem then
set @thisItem = oneItem;
set @constifyResult = CONCAT(@constifyResult, @sep, @thisItem);
set @sep = ",";
end if;
until flagLastItem = 1 end repeat;
close itemCursor;

set constStr = @constifyResult;
END

Use constify with slight caution: don’t constify queries that return too many results! The return value (the string containing the comma-separated values) is limited by the max_packet_length variable, because the result must be sent back to the client, via the socket, in a packet.

Also, be careful when you use constify to select text that might contain MySQL delimiters (single quotes, double quotes, etc.): in this case be sure to SELECT QUOTE(text_field) in your query so the text is properly treated.

The reason why I didn’t make constify a function is because dynamic SQL (prepared statements) is not allowed in functions. But, it should be possible to refactor out the dynamic SQL into a separate stored procedure and then constify could be written as a function. <EDIT> Correction: Nope, MySQL does not support calling stored procedures that contain dynamic SQL from functions (or from triggers), so the constify stored procedure provides the best calling syntax that can be managed. Also, see my follow-up post in this thread for an even better calling syntax for constify.</EDIT>

Using the constify stored procedure above I have removed the dependent subquery and turned it into a constant expression, resulting in the fastest possible execution time for this type of query.



Edited 2 time(s). Last edit at 12/05/2006 07:02AM by Shlomo Swidler.

Options: ReplyQuote


Subject
Views
Written By
Posted
Liberating Dependent Subqueries
14491
November 27, 2006 05:50PM
3828
November 28, 2006 12:57AM
3480
November 28, 2006 03:09AM
3386
November 28, 2006 06:29PM
4179
November 30, 2006 05:39PM
3078
December 01, 2006 09:35AM
2784
December 03, 2006 06:21PM
2829
December 03, 2006 08:18PM
2851
December 05, 2006 06:11AM
4060
December 05, 2006 08:43AM
3841
December 07, 2006 06:26AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.