MySQL :: Re: Need help understanding optimizer behavior in 5.6

New Topic

Re: Need help understanding optimizer behavior in 5.6

Posted by: Øystein Grøvlen
Date: July 01, 2015 03:49AM

Hi Aaron,

Thanks for the bug report and the optimizer trace. It seems the verification team has not processed your bug report yet. However, I expect they might ask you if there is some way to reproduce this. Maybe you can either upload a compressed dump of your database or share the scripts you used to generate it.

Wrt what I wrote about 50% lower row estimates: This is an old hack in InnoDB from when InnoDB and MySQL was owned by different companies. In order to make MySQL use secondary indexes more often, InnoDB adjusted the statistics it reported. You can observe the effect of this in the result of SHOW INDEX as included in your bug report:

mysql> show indexes from entry_group;
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| entry_group | 0 | PRIMARY | 1 | id | A | 24943 | NULL | NULL | | BTREE | | |
| entry_group | 1 | idx_jac_entry_group_on_category_id | 1 | category_id | A | 14 | NULL | NULL | | BTREE | | |
+-------------+------------+------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)

As you can see, the statistics says that there are 14 distinct values for category_id, but we know there is only 7. This way, the estimated number of matches in each lookup in underestimated by 50%. EXPLAIN says that there will be 1781 rows (24943/14) per look-up, while the actual number is 24943/7=3563. Hence, this plan appears to the optimizer to be less costly than it actually is.
This "faking" of statistics have been removed from MySQL 5.7.

Looking at the optimizer trace, we can see that it is actually this issue that cause the optimizer to choose the wrong query plan. First, we see this part:

"rest_of_plan": [
{
"plan_prefix": [
"`category` `c`",
"`entry_group` `g`"
] /* plan_prefix */,
"table": "`entry` `e`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "idx_jac_entry_on_group_id",
"rows": 4,
"cost": 49869,
"chosen": true
},
{
"access_type": "range",
"rows": 17480,
"cost": 4.07e8,
"chosen": false
}
] /* considered_access_paths */
} /* best_access_path */,
"cost_for_plan": 62354,
"rows_for_plan": 49868,
"chosen": true
}
] /* rest_of_plan */

This show that the estimated cost of the c - g - e order is 62354. If we take a look at the evaluation of the desired plan:

"rest_of_plan": [
{
"plan_prefix": [
"`entry` `e`",
"`entry_group` `g`"
] /* plan_prefix */,
"table": "`category` `c`",
"best_access_path": {
"considered_access_paths": [
{
"access_type": "ref",
"index": "PRIMARY",
"rows": 1,
"cost": 23306,
"chosen": true
},
{
"access_type": "scan",
"cause": "covering_index_better_than_full_scan",
"chosen": false
}
] /* considered_access_paths */
} /* best_access_path */,
"added_to_eq_ref_extension": true,
"cost_for_plan": 88564,
"rows_for_plan": 23306,
"pruned_by_cost": true
},

We see that the estimated cost of the e - g - c order is 88564, and the trace says that the plan will not be chosen due to higher cost. Note also from the first trace excerpt, that the total cost of look-ups on the idx_jac_entry_on_group_id is 49869 with 4 matching rows per look-up. If the number of matching rows are doubled to 8 which it would be in MySQL 5.7, the cost of this step would also double, and the total cost for the first join order would then exceed the cost of the second join order. (The cost of the second join order would not change since only primary indexes are used in that plan.)

In other words, I expect your issue to be fixed in MySQL 5.7. However, I am a bit surprised that you also see this issue in the production database where you say your filter is much more selective. I would have thought that high selectivity on the largest table would outweigh the underestimated cost of secondary index look-ups discuss above. In other words, I am not 100% convinced that you have reproduced the original issue. If you can provide the output of SHOW INDEX for these tables in the production database and the typical number of rows satisfying the range condition, I might be able to tell whether it is the same issue or not.

Anyhow, I do not think adding a STRAIGHT_JOIN hint will cause any risk in this case. I do think anything can beat starting with the only table that can be filtered when that also means that primary index look-ups can be used for all join
operations.

Hope this helps,

Øystein Grøvlen,
Senior Principal Software Engineer,
MySQL Group, Oracle,
Trondheim, Norway

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Need help understanding optimizer behavior in 5.6

2208

Aaron Craven

June 22, 2015 08:30AM

Re: Need help understanding optimizer behavior in 5.6

1079

Aaron Craven

June 22, 2015 08:31AM

Re: Need help understanding optimizer behavior in 5.6

1086

Rick James

June 23, 2015 07:00PM

Re: Need help understanding optimizer behavior in 5.6

1075

Aaron Craven

June 24, 2015 06:22AM

Re: Need help understanding optimizer behavior in 5.6

983

Rick James

June 25, 2015 11:37PM

Re: Need help understanding optimizer behavior in 5.6

1054

Aaron Craven

June 26, 2015 06:56AM

Re: Need help understanding optimizer behavior in 5.6

1062

Øystein Grøvlen

June 30, 2015 05:46AM

Re: Need help understanding optimizer behavior in 5.6

1574

Aaron Craven

June 30, 2015 08:06AM

Re: Need help understanding optimizer behavior in 5.6

1038

Aaron Craven

June 30, 2015 08:16AM

Re: Need help understanding optimizer behavior in 5.6

1223

Øystein Grøvlen

July 01, 2015 03:49AM

Re: Need help understanding optimizer behavior in 5.6

1223

Aaron Craven

July 01, 2015 06:48AM

Re: Need help understanding optimizer behavior in 5.6

1070

Edwin Desouza

July 01, 2015 03:05PM

Re: Need help understanding optimizer behavior in 5.6

1058

Aaron Craven

July 02, 2015 07:48AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.