I am having trouble optimizing some sorted/paged queries. I've read these posts
http://www.mysqlperformanceblog.com/2006/09/01/order-by-limit-performance-optimization/,
http://www.mysqlperformanceblog.com/2007/09/18/possible-optimization-for-sort_merge-and-union-order-by-limit/, but I'm trying to figure out if those suggestions are my only option, or if I'm just screwing something up.
Schema:
create table topic (
id bigint not null auto_increment,
name varchar(50) not null,
/* a few more columns here */
primary key (id)
) ENGINE=InnoDB;
create table entry (
id bigint not null auto_increment,
created_ts bigint not null,
updated_ts bigint not null,
author varchar(255) not null,
group_uuid varchar(255),
body varchar(1000) not null,
type tinyint not null, /* enum in java, currently only 2 values */
topic_id bigint,
/* a bunch more columns here, including a longtext column */
primary key (id)
) ENGINE=InnoDB;
alter table entry
add index FKE738523512C33807 (topic_id),
add constraint FKE738523512C33807
foreign key (topic_id)
references topic (id);
create index entry_c_idx on entry (created_ts);
create index entry_u_idx on entry (updated_ts);
create index entry_ac_idx on entry (author, created_ts);
create index entry_au_idx on entry (author, updated_ts);
create index entry_ac_idx on entry (group_uuid, created_ts);
create index entry_au_idx on entry (group_uuid, updated_ts);
create index entry_tc_idx on entry (type, created_ts);
create index entry_tu_idx on entry (type, updated_ts);
create index entry_tpc_idx on entry (topic_id, created_ts);
create index entry_tpu_idx on entry (topic_id, updated_ts);
Table sizes:
* topic = 100 rows
* entry = 327,000 rows (27K where type=1, 300K where type=2)
* distinct authors: 100,000
* distinct group uuids: 101 (~2400 entries per group, ~90,000 with null group)
Fast queries (~ 0.040 sec):
select e.id from entry e order by e.created_ts desc, e.id desc limit 20;
select e.id from entry e order by e.updated_ts desc, e.id desc limit 20;
select e.id from entry e where e.type = 2 order by e.created_ts desc, e.id desc limit 20;
select e.id from entry e where e.type = 2 order by e.updated_ts desc, e.id desc limit 20;
select e.id from entry e where e.topic_id = 5 order by e.created_ts desc, e.id desc limit 20;
select e.id from entry e where e.topci_id = 5 order by e.updated_ts desc, e.id desc limit 20;
Problem query 1 (~ 1.500 sec):
select e.id from entry e
where (e.author = 'my.usr.not-selective' or e.group_uuid is null or e.group_uuid in ('5d7e7111-55d9-43f5-bca5-d107f122735c', '00f9c21b-f877-4e67-8964-a67438426c81'))
order by e.created_ts desc, e.id desc limit 20;
/*
Explain:
1, SIMPLE, e, index_merge, ..., entry_ac_idx,entry_gu_idx, 767,768, , 12936, Using sort_union(entry_ac_idx,entry_gu_idx); Using where; Using filesort
*/
Problem query 2 (~ 1.500 sec):
select e.id from entry e
where (e.author = 'my.usr.not-selective' or e.group_uuid is null or e.group_uuid in ('5d7e7111-55d9-43f5-bca5-d107f122735c', '00f9c21b-f877-4e67-8964-a67438426c81'))
and (e.type = 2 or (e.topic_id in (1,2,3,4,5)))
order by e.created_ts desc, e.id desc limit 20;
/*
Explain:
1, SIMPLE, e, index_merge, ..., entry_ac_idx,entry_gu_idx, 767,768, , 12936, Using sort_union(entry_ac_idx,entry_gu_idx); Using where; Using filesort
*/
Problem query 3 (~ 3.000 sec):
select e.id from entry e
where (e.author = 'my.usr.selective' or e.group_uuid is null or e.group_uuid in ('no-entries-with-this-uuid-1', 'no-entries-with-this-uuid-2'))
and (e.type = 2)
order by e.created_ts desc, e.id desc limit 20;
/*
Explain:
1, SIMPLE, e, ref, entry_tc_idx,entry_ac_idx,entry_gc_idx, entry_tc_idx, 1, const, 172286, Using where
*/
The problem seems to be that MySQL is picking the wrong approach for the queries. For the first two queries, the filter is not very selective, so there are lots of rows to merge and sort. If I add a "force index (entry_c_idx)" (or if I just add a bogus where condition: e.created_ts > 0), the query becomes very fast, because MySQL only has to look through a few rows to find 20 that satisfy the filter (that's my understanding at least).
For the 3rd query, it's the opposite case -- the author/group filter is more selective than the type filter (there are 300K rows with type=2, but only 90K using the author/group filter). Using "force index (entry_ac_idx,entry_gc_idx), the query time drops from 3sec to 1sec.
Is there any way to make this fast for the various cases? Did I create the right indexes? I'm using Hibernate from Java, so it's not possible to use optimizer hints or SQL's UNION unless I use native SQL queries, which I want to avoid since I want to support databases other than MySQL. Actually, I think optimizer hints wouldn't work anyway, because if I force it down one path every time, it will be slow for the opposite case.
Thank you!
Nikita
For reference:
* I'm using MySQL 5.1.55 (winx64, all default settings)