I am trying to optimize a query that joins 2 tables, using a column in one table to filter the data, and using a column in the other table to sort and limit the results.
Here's how the tables are defined:
DROP TABLE IF EXISTS `biz`;
CREATE TABLE `biz` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` varchar(4000) DEFAULT NULL,
`created_ts` bigint(20) NOT NULL,
`type` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`),
KEY `biz_ct_idx` (`created_ts`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `biz_groups`;
CREATE TABLE `biz_groups` (
`biz_id` bigint(20) NOT NULL,
`group_uuid` varchar(255) NOT NULL,
KEY `bi_idx` (`biz_id`),
KEY `gubi_idx` (`group_uuid`,`biz_id`),
CONSTRAINT `biz_groups_bi_fk` FOREIGN KEY (`biz_id`) REFERENCES `biz` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The data is like this:
* biz table has 200,000 rows
* biz_groups table has 400,000 rows (2 group uuids per biz id)
* there are only 40 unique group uuids, so for a given group uuid, there are 10,000 biz ids
This is my query:
select SQL_NO_CACHE distinct b.id
from biz b
inner join biz_groups g on g.biz_id = b.id
where g.group_uuid in ('group-uuid-01','group-uuid-03','group-uuid-05', 'group-uuid-07')
order by b.created_ts desc
limit 20 offset 20;
20 rows in set (1.30 sec)
+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | g | range | bi_idx,gubi_idx | gubi_idx | 767 | NULL | 56074 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 8 | appiandb.g.biz_id | 1 | 100.00 | |
+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+
I guess this query takes > 1 sec, because the filter is not very selective (40,000 rows). If I change the where condition to use group uuids that are not in the biz_groups table, the query is very fast (~50ms).
I've tried re-writing the query using a sub-query instead of a join:
select SQL_NO_CACHE b.id
from biz b
where b.id in (
select SQL_NO_CACHE g.biz_id
from biz_groups g
where g.group_uuid in ('group-uuid-01','group-uuid-03','group-uuid-05','group-uuid-07')
)
order by b.created_ts desc
limit 20 offset 20;
20 rows in set (0.02 sec)
+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+
| 1 | PRIMARY | b | index | NULL | biz_ct_idx | 8 | NULL | 40 | 496027.50 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | g | index_subquery | bi_idx,gubi_idx | bi_idx | 8 | func | 1 | 100.00 | Using where |
+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+
This makes the it very fast when the filter is not selective. However, if the group uuids used for filtering are not in the biz_groups table, the query becomes very slow (3 sec). I think this makes sense, since in that case MySQL is scanning through the entire index.
My question is whether it's possible to write this kind of query (or change the indexes/tables) in such a way that would allow MySQL to figure out the best way to execute it? I would like to keep the time under 100 ms, regardless of the filter params.
Thank you,
Nikita
NOTE: I'm using MySQL 5.1.55-winx64 with the my-huge.ini config.