MySQL Forums
Forum List  »  Optimizer & Parser

Re: Need help understanding optimizer behavior in 5.6
Posted by: Øystein Grøvlen
Date: July 01, 2015 03:49AM

Hi Aaron,

Thanks for the bug report and the optimizer trace. It seems the verification team has not processed your bug report yet. However, I expect they might ask you if there is some way to reproduce this. Maybe you can either upload a compressed dump of your database or share the scripts you used to generate it.

Wrt what I wrote about 50% lower row estimates: This is an old hack in InnoDB from when InnoDB and MySQL was owned by different companies. In order to make MySQL use secondary indexes more often, InnoDB adjusted the statistics it reported. You can observe the effect of this in the result of SHOW INDEX as included in your bug report:

mysql> show indexes from entry_group;
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| entry_group | 0 | PRIMARY | 1 | id | A | 24943 | NULL | NULL | | BTREE | | |
| entry_group | 1 | idx_jac_entry_group_on_category_id | 1 | category_id | A | 14 | NULL | NULL | | BTREE | | |
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

As you can see, the statistics says that there are 14 distinct values for category_id, but we know there is only 7. This way, the estimated number of matches in each lookup in underestimated by 50%. EXPLAIN says that there will be 1781 rows (24943/14) per look-up, while the actual number is 24943/7=3563. Hence, this plan appears to the optimizer to be less costly than it actually is.
This "faking" of statistics have been removed from MySQL 5.7.

Looking at the optimizer trace, we can see that it is actually this issue that cause the optimizer to choose the wrong query plan. First, we see this part:

"rest_of_plan": [
{
"plan_prefix": [
"`category` `c`",
"`entry_group` `g`"
] /* plan_prefix */,
"table": "`entry` `e`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "idx_jac_entry_on_group_id",
"rows": 4,
"cost": 49869,
"chosen": true
},
{
"access_type": "range",
"rows": 17480,
"cost": 4.07e8,
"chosen": false
}
] /* considered_access_paths */
} /* best_access_path */,
"cost_for_plan": 62354,
"rows_for_plan": 49868,
"chosen": true
}
] /* rest_of_plan */

This show that the estimated cost of the c - g - e order is 62354. If we take a look at the evaluation of the desired plan:

"rest_of_plan": [
{
"plan_prefix": [
"`entry` `e`",
"`entry_group` `g`"
] /* plan_prefix */,
"table": "`category` `c`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "PRIMARY",
"rows": 1,
"cost": 23306,
"chosen": true
},
{
"access_type": "scan",
"cause": "covering_index_better_than_full_scan",
"chosen": false
}
] /* considered_access_paths */
} /* best_access_path */,
"added_to_eq_ref_extension": true,
"cost_for_plan": 88564,
"rows_for_plan": 23306,
"pruned_by_cost": true
},

We see that the estimated cost of the e - g - c order is 88564, and the trace says that the plan will not be chosen due to higher cost. Note also from the first trace excerpt, that the total cost of look-ups on the idx_jac_entry_on_group_id is 49869 with 4 matching rows per look-up. If the number of matching rows are doubled to 8 which it would be in MySQL 5.7, the cost of this step would also double, and the total cost for the first join order would then exceed the cost of the second join order. (The cost of the second join order would not change since only primary indexes are used in that plan.)

In other words, I expect your issue to be fixed in MySQL 5.7. However, I am a bit surprised that you also see this issue in the production database where you say your filter is much more selective. I would have thought that high selectivity on the largest table would outweigh the underestimated cost of secondary index look-ups discuss above. In other words, I am not 100% convinced that you have reproduced the original issue. If you can provide the output of SHOW INDEX for these tables in the production database and the typical number of rows satisfying the range condition, I might be able to tell whether it is the same issue or not.

Anyhow, I do not think adding a STRAIGHT_JOIN hint will cause any risk in this case. I do think anything can beat starting with the only table that can be filtered when that also means that primary index look-ups can be used for all join
operations.

Hope this helps,

Øystein Grøvlen,
Senior Principal Software Engineer,
MySQL Group, Oracle,
Trondheim, Norway

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.