When calculating the records per key, we see the MySQL using the "rec_per_key_t(tab->records()) / distinct_keys_est + 1;" ,my question is why it doesn't use the "rec_per_key_t(tab->records()) / distinct_keys_est ;" instead ,why it needs the "+ 1" here?
The code segment is below:
/*
Assume that the first key part matches 1% of the file
and that the whole key matches 10 (duplicates) or 1
(unique) records.
Assume also that more key matches proportionally more
records
This gives the formula:
records = (x * (b-a) + a*c-b)/(c-1)
b = records matched by whole key
a = records matched by first key part (1% of all records?)
c = number of key parts in key
x = used key parts (1 <= x <= c)
*/
rec_per_key_t rec_per_key;
if (keyinfo->has_records_per_key(
keyinfo->user_defined_key_parts - 1))
rec_per_key=
keyinfo->records_per_key(keyinfo->user_defined_key_parts - 1);
else
rec_per_key=
rec_per_key_t(tab->records()) / distinct_keys_est + 1;
The code segment above can be found here:
https://github.com/mysql/mysql-server/blob/5.7/sql/sql_planner.cc#L556