MySQL Forums
Forum List  »  Optimizer & Parser

Queries using JOIN and ORDER BY LIMIT
Posted by: Nikita Dubrovsky
Date: March 10, 2012 01:11PM

I am trying to optimize a query that joins 2 tables, using a column in one table to filter the data, and using a column in the other table to sort and limit the results.

Here's how the tables are defined:
DROP TABLE IF EXISTS `biz`;
CREATE TABLE `biz` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `description` varchar(4000) DEFAULT NULL,
  `created_ts` bigint(20) NOT NULL,
  `type` bigint(20) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `biz_ct_idx` (`created_ts`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `biz_groups`;
CREATE TABLE `biz_groups` (
  `biz_id` bigint(20) NOT NULL,
  `group_uuid` varchar(255) NOT NULL,
  KEY `bi_idx` (`biz_id`),
  KEY `gubi_idx` (`group_uuid`,`biz_id`),
  CONSTRAINT `biz_groups_bi_fk` FOREIGN KEY (`biz_id`) REFERENCES `biz` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The data is like this:
* biz table has 200,000 rows
* biz_groups table has 400,000 rows (2 group uuids per biz id)
* there are only 40 unique group uuids, so for a given group uuid, there are 10,000 biz ids

This is my query:
select SQL_NO_CACHE distinct b.id
from biz b
inner join biz_groups g on g.biz_id = b.id
where g.group_uuid in ('group-uuid-01','group-uuid-03','group-uuid-05', 'group-uuid-07')
order by b.created_ts desc
limit 20 offset 20;

20 rows in set (1.30 sec)

+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+
| id | select_type | table | type   | possible_keys   | key      | key_len | ref               | rows  | filtered | Extra                                                     |
+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+
|  1 | SIMPLE      | g     | range  | bi_idx,gubi_idx | gubi_idx | 767     | NULL              | 56074 |   100.00 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | b     | eq_ref | PRIMARY         | PRIMARY  | 8       | appiandb.g.biz_id |     1 |   100.00 |                                                           |
+----+-------------+-------+--------+-----------------+----------+---------+-------------------+-------+----------+-----------------------------------------------------------+

I guess this query takes > 1 sec, because the filter is not very selective (40,000 rows). If I change the where condition to use group uuids that are not in the biz_groups table, the query is very fast (~50ms).

I've tried re-writing the query using a sub-query instead of a join:
select SQL_NO_CACHE b.id
from biz b
where b.id in (
  select SQL_NO_CACHE g.biz_id
  from biz_groups g
  where g.group_uuid in ('group-uuid-01','group-uuid-03','group-uuid-05','group-uuid-07')
)
order by b.created_ts desc
limit 20 offset 20;

20 rows in set (0.02 sec)

+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+
| id | select_type        | table | type           | possible_keys   | key        | key_len | ref  | rows | filtered  | Extra                    |
+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+
|  1 | PRIMARY            | b     | index          | NULL            | biz_ct_idx | 8       | NULL |   40 | 496027.50 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | g     | index_subquery | bi_idx,gubi_idx | bi_idx     | 8       | func |    1 |    100.00 | Using where              |
+----+--------------------+-------+----------------+-----------------+------------+---------+------+------+-----------+--------------------------+

This makes the it very fast when the filter is not selective. However, if the group uuids used for filtering are not in the biz_groups table, the query becomes very slow (3 sec). I think this makes sense, since in that case MySQL is scanning through the entire index.

My question is whether it's possible to write this kind of query (or change the indexes/tables) in such a way that would allow MySQL to figure out the best way to execute it? I would like to keep the time under 100 ms, regardless of the filter params.

Thank you,
Nikita

NOTE: I'm using MySQL 5.1.55-winx64 with the my-huge.ini config.

Options: ReplyQuote


Subject
Views
Written By
Posted
Queries using JOIN and ORDER BY LIMIT
19475
March 10, 2012 01:11PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.