MySQL Forums
Forum List  »  Optimizer & Parser

suggestion for optimizing with non-matching compound indexes
Posted by: Nathan Cheng
Date: March 08, 2006 12:39PM

Perhaps there is a name for this in DB theory, and perhaps this has already been discussed on here elsewhere, and perhaps there is a good reason not to do this; I don't know and I haven't checked. But here is what I think is a good idea for optimizing queries on a table with index (a, b) and a WHERE clause that includes b but NOT a.

Consider this example from the open source Java mail server known as "James":

-----------------------------------------------------

mysql> show create table deadletter;
CREATE TABLE `deadletter` (
`message_name` varchar(200) NOT NULL default '',
`repository_name` varchar(255) NOT NULL default '',
`message_state` varchar(30) NOT NULL default '',
`error_message` varchar(200) default NULL,
`sender` varchar(255) default NULL,
`recipients` text NOT NULL,
`remote_host` varchar(255) NOT NULL default '',
`remote_addr` varchar(20) NOT NULL default '',
`message_body` longblob NOT NULL,
`message_attributes` longblob,
`last_updated` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`repository_name`,`message_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=4294967295

mysql> select repository_name, count(*) from deadletter group by repository_name;
+-----------------+----------+
| repository_name | count(*) |
+-----------------+----------+
| address-error | 62454 |
| error | 70 |
| spam | 281567 |
+-----------------+----------+
3 rows in set (0.87 sec)

mysql> select repository_name from deadletter where message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (33.45 sec)

mysql> select repository_name from deadletter where repository_name = 'spam' and message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (0.00 sec)

mysql> select repository_name from deadletter where repository_name = 'address-error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)

mysql> select repository_name from deadletter where repository_name = 'error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)
-----------------------------------------------------

So can you anticipate my suggestion? It is simple: When you have a table Q with index (a, b) and WHERE clause containing a condition on b but NOT on a, instead of treating the query as one with no matching indexes, the query optimizer should use the index (a, b) to segment the table into N = COUNT( DISTINCT a ) segments, and perform a UNION of N queries with the appropriate condition on column a prepended to the WHERE clause for each of the N queries. This technique should work equally well if column a is actually a set of columns.

If the number of segments is not much less than the number of rows, then of course this technique will only slow things down, so it should not be used when !( N << row_count ); this is an easy and fast determination for the query optimizer to make, and it would only be tested for when there is no "properly" matching index.

So I think this is a great idea, but would be very surprised if it were a novel one.

Options: ReplyQuote


Subject
Views
Written By
Posted
suggestion for optimizing with non-matching compound indexes
4704
March 08, 2006 12:39PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.