MySQL Forums
Forum List  »  Optimizer & Parser

Re: Liberating Dependent Subqueries
Posted by: Shlomo Swidler
Date: November 30, 2006 05:39PM

Perhaps a little more should be said about the performance of the “constify” method vs. the other ways of writing the query. Saying only a very little more sounds like this: "I tried the DELETE with a JOIN, and using constify is still faster". Read on for the detailed explanation, where I say a lot more about the performance.

There are three possible methods to use: the dependent subquery, the join, and the constify method.
1) The dependent subquery - this has the worst performance:

mysql> explain select * from book_tags where tag_id in ( select tag_id from tags where LOCATE('bunk', tag_text)>0 );
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+
| 1 | PRIMARY | book_tags | ALL | NULL | NULL | NULL | NULL | 49032352 | Using where |
| 2 | DEPENDENT SUBQUERY | tags | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using index; Using where |
+----+--------------------+-----------+-----------------+---------------+---------+---------+------+----------+--------------------------+

2) The join - better, but not fast enough:

mysql> EXPLAIN SELECT * FROM book_tags bt, tags t WHERE bt.tag_id = t.tag_id AND LOCATE('bunk', t.tag_text)>0;
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+
| 1 | SIMPLE | bt | ALL | tag_id | NULL | NULL | NULL | 49032352 | |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | temp.bt.tag_id | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+----------------+----------+-------------+

3) The “constify” method, which performs best.

We can estimate constify’s performance by estimating the performance of each element within constify:

P(constifyMethodTotal) = P(constify SP) + P(DELETE).

Looking at constify we see that

P(constify SP) = P(select parameter) + P(simple cursor) + P(CONCAT) + P(communications),

where:
- P(select parameter) is the performance of the select passed in as a parameter (“select tag_id from tags where LOCATE('bunk', tag_text)>0”)

This clearly depends on the query you pass in. For the query in the example the performance reflects evaluating every row once:

mysql> explain select * from tags where LOCATE('bunk', tag_text)>0;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | tags | ALL | NULL | NULL | NULL | NULL | 8985414 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

- P(simple cursor) is the performance of the cursor reading the results of the above select

This is a simple cursor, reading the rows from the table in “natural” order. This is the fastest type of cursor, but no specific numbers can be given. The performance here is correlated with the number of result rows returned from the query parameter.

- P(CONCAT) is the performance of CONACTing all the result rows together

Again, this performance depends on the number of rows returned by the query parameter.

- P(communications) reflects the performance of sending the result string back to the client

Also dependent on the number of result rows from the query parameter, since the returned string gets longer as each row is CONCATed on.

Thus, the sum total performance of the constify method is, approximately:

P(constifyMethodTotal) = P(select parameter) + P(simple cursor) + P(CONCAT) + P(communications) + P(DELETE)

As for the P(DELETE), the DELETE’s performance can be EXPLAINed, as in my original post, as follows:

mysql> EXPLAIN SELECT * FROM book_tags WHERE tag_id IN ( 1,2,56,1000,30405,1010200,10203439 );
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+
| 1 | SIMPLE | book_tags | range | tag_id | tag_id | 4 | NULL | 11433 | Using where |
+----+-------------+-----------+-------+---------------+--------+---------+------+-------+-------------+

Performance of this query is also dependent on the number of rows returned by the select parameter to the constify SP.

So, for my example case,

P(constifyMethodTotal) = “every row of 9 million” + “every row of select result” + “fast string operation” + “socket communications” + “11K rows via a range lookup in the index”

Expressing the performance of the other two methods in words, we have:

P(dependent subquery) = “every row of 50 million” * “evalute a function on the primary key of another table”

and

P(join) = “every row of 50 million” * “lookup by exact match on the primary key of another table”

Notice the “*” in these two expressions. This indicates that performance of the dependent subquery and the join will be quadratic O(n*m), while the “+” in the expression for the constify method indicates linear O(n+m) performance.

Indeed, in practice (on my data set) the constify method is the fastest.

Options: ReplyQuote


Subject
Views
Written By
Posted
14650
November 27, 2006 05:50PM
3896
November 28, 2006 12:57AM
3545
November 28, 2006 03:09AM
3483
November 28, 2006 06:29PM
Re: Liberating Dependent Subqueries
4257
November 30, 2006 05:39PM
3161
December 01, 2006 09:35AM
2868
December 03, 2006 06:21PM
2883
December 03, 2006 08:18PM
2895
December 05, 2006 06:11AM
4184
December 05, 2006 08:43AM
3934
December 07, 2006 06:26AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.