MySQL Forums
Forum List  »  Optimizer & Parser

Re: Can we expect an enhancement for derived table merge optimization?
Posted by: Roy Lyseng
Date: January 11, 2013 09:18AM

Hi Rick,

some notes on the changes applied in MySQL 5.6:

Case 1:
FROM ( SELECT ... )
WHERE ...
* Since 4.1, the subquery would create a temp table.
* Since there won't be any INDEXes on the temp table, the WHERE clause filtering would be done on a "table scan" of the temp table.
* If the subquery had something that shrank the number of rows (GROUP BY, DISTINCT, LIMIT, etc), there sometimes could be a benefit from doing the query this way.

No change here in MySQL 5.6.

Case 2:
FROM ( SELECT ... )
JOIN tbl ON ...
* Since 4.1, the subquery would be evaluated and scanned first, and with no indexes. Then the JOIN to some other table would be done via "Nexted Loop Join".
* If the subselect had something that shrank the number of rows (GROUP BY, DISTINCT, LIMIT, etc), there sometimes could be a benefit from doing the query this way.

The derived table is still materialized into a temporary table, but we analyze the predicates between the derived table and the outer query and are able to create an index on the derived table. The join optimizer is invoked, and it will calculate the best order for the tables, as well as the best index to build on the materialized table.

Case 3:
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
* If both derived tables have multiple rows, this turns into a cross-join, with filtering (ON, WHERE) later. Grossly inefficient.
* With 5.6 (specifically 5.6.7?) (soon to be GA), the optimizer will "materialize" the tables and will try all possible indexes and create the optimal one(s). Grossly efficient! Far better than simply materializing.

I think this is equivalent to Case 2, but with two derived tables, and, as you say, we generate indexes on both materialized tables if it is appropriate.

Case 4:
FROM tbl
LEFT JOIN ( SELECT ... ) x ON x.foo = tbl.foo
WHERE x.foo IS [NOT] NULL
* Similar comments to Case 3.

Indeed, with the obvious restriction that join order is determined based on the outer join.

Case 5:
WHERE foo IN ( SELECT ... )
* Since 4.1, this has been one of the worst-performing constructs available. It reevaluates the subquery repeatedly.
* With 5.6 (soon to be GA) and MariaDB 5.5 (GA for some time) the subquery is evaluated only once ("materialized"). This greatly improves the construct.

There are actually two possible outcomes:
a) If the subquery is non-aggregated, it is converted to a semi-join. See below.
b) If the subquery is aggregated, or if semi-join is disabled, the subquery is materialized and an index is created on it. Subquery evaluation is then transformed to efficient primary-key lookups into the materialized table.

Case 6:
(I can't think of a specific example)
* A "Semijoin" (I think) is where one side can have duplicates, but it doesn't care. Examples of similar situations: IN, EXISTS. 5.6 has the 'optimization' optimization; MariaDB 5.5 may have it.

An IN subquery predicate with a non-aggregated subquery (and a few other restrictions) is converted to a semi-join operation. The nice thing about this is that is treated almost equally to a regular JOIN, so we are able to make a cost-based choice about table order and semi-join execution method. The execution can be done outside-in or inside-out. outside-in (also called FirstMatch) is the same as the standard execution method in 5.5, and is still efficient if there is an index on the subquery columns. If there is no index, then it sometimes pays to materialize the subquery and do key-based lookups into it (much like Case 6).
There are also two inside-out algorithms (ie the subquery tables are accessed first, followed by the outer tables). One is called LooseScan, the second is called MaterializeScan, which first materializes the subquery before it scans it and uses field values to do lookups into the outer tables.
Finally, there is a method called DuplicateWeedout that effectively performs a full join over all tables, and then filters out duplicates coming from the subquery tables.

Case 7:
VIEWs
* Often a VIEW performs worse than the equivalent SELECT. It is as if they only half-optimized VIEWs.
* I don't know if there are yet any optimizations in the new releases.

I am not aware of any principal differences between views and regular tables, but I know that we have fixed several bugs that affect the performance of access through views. Performance problems when accessing views should be reported, and we will do our best to fix them.

Thanks,
Roy Lyseng,
MySQL optimizer team.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.