MySQL :: suggestion for optimizing with non-matching compound indexes

New Topic

suggestion for optimizing with non-matching compound indexes

Posted by: Nathan Cheng
Date: March 08, 2006 12:39PM

Perhaps there is a name for this in DB theory, and perhaps this has already been discussed on here elsewhere, and perhaps there is a good reason not to do this; I don't know and I haven't checked. But here is what I think is a good idea for optimizing queries on a table with index (a, b) and a WHERE clause that includes b but NOT a.

Consider this example from the open source Java mail server known as "James":

-----------------------------------------------------

mysql> show create table deadletter;
CREATE TABLE `deadletter` (
`message_name` varchar(200) NOT NULL default '',
`repository_name` varchar(255) NOT NULL default '',
`message_state` varchar(30) NOT NULL default '',
`error_message` varchar(200) default NULL,
`sender` varchar(255) default NULL,
`recipients` text NOT NULL,
`remote_host` varchar(255) NOT NULL default '',
`remote_addr` varchar(20) NOT NULL default '',
`message_body` longblob NOT NULL,
`message_attributes` longblob,
`last_updated` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`repository_name`,`message_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=4294967295

mysql> select repository_name, count(*) from deadletter group by repository_name;
+-----------------+----------+
| repository_name | count(*) |
+-----------------+----------+
| address-error | 62454 |
| error | 70 |
| spam | 281567 |
+-----------------+----------+
3 rows in set (0.87 sec)

mysql> select repository_name from deadletter where message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (33.45 sec)

mysql> select repository_name from deadletter where repository_name = 'spam' and message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (0.00 sec)

mysql> select repository_name from deadletter where repository_name = 'address-error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)

mysql> select repository_name from deadletter where repository_name = 'error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)
-----------------------------------------------------

So can you anticipate my suggestion? It is simple: When you have a table Q with index (a, b) and WHERE clause containing a condition on b but NOT on a, instead of treating the query as one with no matching indexes, the query optimizer should use the index (a, b) to segment the table into N = COUNT( DISTINCT a ) segments, and perform a UNION of N queries with the appropriate condition on column a prepended to the WHERE clause for each of the N queries. This technique should work equally well if column a is actually a set of columns.

If the number of segments is not much less than the number of rows, then of course this technique will only slow things down, so it should not be used when !( N << row_count ); this is an easy and fast determination for the query optimizer to make, and it would only be tested for when there is no "properly" matching index.

So I think this is a great idea, but would be very surprised if it were a novel one.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

suggestion for optimizing with non-matching compound indexes

4949

Nathan Cheng

March 08, 2006 12:39PM

Re: suggestion for optimizing with non-matching compound indexes

2295

Harrison Fisk

March 16, 2006 11:40AM

Re: suggestion for optimizing with non-matching compound indexes

2193

Nathan Cheng

March 16, 2006 12:15PM

Re: suggestion for optimizing with non-matching compound indexes

2973

Björn Steinbrink

March 22, 2006 09:48AM

Re: suggestion for optimizing with non-matching compound indexes

2159

Björn Steinbrink

March 22, 2006 09:12PM

Re: suggestion for optimizing with non-matching compound indexes

2130

Björn Steinbrink

March 24, 2006 11:02AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.