MySQL Forums
Forum List  »  Optimizer & Parser

Re: suggestion for optimizing with non-matching compound indexes
Posted by: Björn Steinbrink
Date: March 22, 2006 09:48AM

This got me interested and I played around a bit...

----

This is my test table:
mysql> desc a;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| a | int(11) | YES | MUL | NULL | |
| b | int(11) | YES | | NULL | |
| foo | varchar(10) | YES | | NULL | |
| bar | varchar(10) | YES | | NULL | |
| fasd | varchar(10) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+

mysql> show keys from a;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| a | 1 | foo | 1 | a | A | 3 | NULL | NULL | YES | BTREE | |
| a | 1 | foo | 2 | b | A | 299997 | NULL | NULL | YES | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
| 2399976 |
+----------+


'a' is 1, 2 or 3
'b' is from 1-99999
the other columns just have random values, and are there to make the queries not having "Using index"

----

The basic queries I tried were:
select * from a where b = 5;
select * from a where b = 5 and a in (select distinct a from a);
select a.* from a join (select distinct a from a) b where a.a=b.a and b=5;

Depending on the table state, I've sometimes seen some improvements with one of the last two queries, but those were gone after I optimized the tables. Then I started to increase the size
of the table (I started out with about 300k rows) and some problems with the optimizer became visible.

----

Number 1:

mysql> explain select * from a where b = 5 and a in (select distinct a from a);
+----+--------------------+-------+----------------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+----------------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | a | ALL | NULL | NULL | NULL | NULL | 2399976 | Using where |
| 2 | DEPENDENT SUBQUERY | a | index_subquery | foo | foo | 5 | func | 799992 | Using index |
+----+--------------------+-------+----------------+---------------+------+---------+------+---------+-------------+

Maybe I just lack knowledge here, but the subquery looks totally independent to me.

Number 2:

mysql> explain select a.* from a join (select distinct a from a) b where a.a=b.a and b=3;
+----+-------------+------------+-------+---------------+------+---------+-----------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+-----------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | a | ref | foo | foo | 10 | b.a,const | 8 | Using where |
| 2 | DERIVED | a | index | NULL | foo | 10 | NULL | 2399976 | Using index |
+----+-------------+------------+-------+---------------+------+---------+-----------+---------+-------------+
3 rows in set (1.71 sec)

The time the optimizer takes to determine how to do the query is what makes the query slow, the actual query takes about as long.

The problem can be reduced to:
mysql> explain select * from (select distinct a from a) b;
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 2 | DERIVED | a | index | NULL | foo | 10 | NULL | 2399976 | Using index |
+----+-------------+------------+-------+---------------+------+---------+------+---------+-------------+
2 rows in set (1.70 sec)

Where the subquery on its own is blazingly fast:
mysql> select distinct a from a;
+------+
| a |
+------+
| 1 |
| 2 |
| 3 |
+------+
3 rows in set (0.00 sec)

----

Using a second table that just lists the possible values for 'a' you can get the performance you'd expect, because the table is small and the optimizer gets fast.
mysql> desc b;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| a | int(11) | NO | PRI | 0 | |
+-------+---------+------+-----+---------+-------+

mysql> explain select a.* from a join (select a from b) b where a.a=b.a and b=5;
+----+-------------+------------+-------+---------------+---------+---------+-----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+-----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | a | ref | foo | foo | 10 | b.a,const | 8 | Using where |
| 2 | DERIVED | b | index | NULL | PRIMARY | 4 | NULL | 3 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+-----------+------+-------------+
3 rows in set (0.00 sec)

The query itself is as fast:
mysql> select a.* from a join (select a from b) b where a.a=b.a and b=7;
+------+------+------------+------------+------------+
| a | b | foo | bar | fasd |
+------+------+------------+------------+------------+
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
+------+------+------------+------------+------------+
24 rows in set (0.00 sec)


Versus:
mysql> select * from a where b=7;
+------+------+------------+------------+------------+
| a | b | foo | bar | fasd |
+------+------+------------+------------+------------+
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
| 1 | 7 | 0.42062982 | 0.53487795 | 0.21304945 |
| 2 | 7 | 0.20550768 | 0.15841783 | 0.60064315 |
| 3 | 7 | 0.01212023 | 0.72405366 | 0.72916268 |
+------+------+------------+------------+------------+
24 rows in set (0.89 sec)


Unfortunately you cannot create that table on the fly, as you have the optimizer overhead again:
mysql> insert into b select distinct a from a;
Query OK, 3 rows affected (1.77 sec)
Records: 3 Duplicates: 0 Warnings: 0

So, if you want to use this workaround, you have to keep a table that is small and just lists the possible values for your key prefix, until the issue with the optimizer being slow is solved.

One more thing worth mentioning regarding the optimizer problem is this:
mysql> explain select distinct a from a;
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | a | range | NULL | foo | 5 | NULL | 4 | Using index for group-by |
+----+-------------+-------+-------+---------------+------+---------+------+------+--------------------------+

Here we have "Using index for group-by", while the same thing as a subquery above only has "Using index".

Edit: Fixed the paste of the "blazingly fast" query, I accidently pasted the slow one when it was answered from the query cache, instead of pasting the subquery alone.

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: suggestion for optimizing with non-matching compound indexes
2848
March 22, 2006 09:48AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.