suggestion for optimizing with non-matching compound indexes
Perhaps there is a name for this in DB theory, and perhaps this has already been discussed on here elsewhere, and perhaps there is a good reason not to do this; I don't know and I haven't checked. But here is what I think is a good idea for optimizing queries on a table with index (a, b) and a WHERE clause that includes b but NOT a.
Consider this example from the open source Java mail server known as "James":
-----------------------------------------------------
mysql> show create table deadletter;
CREATE TABLE `deadletter` (
`message_name` varchar(200) NOT NULL default '',
`repository_name` varchar(255) NOT NULL default '',
`message_state` varchar(30) NOT NULL default '',
`error_message` varchar(200) default NULL,
`sender` varchar(255) default NULL,
`recipients` text NOT NULL,
`remote_host` varchar(255) NOT NULL default '',
`remote_addr` varchar(20) NOT NULL default '',
`message_body` longblob NOT NULL,
`message_attributes` longblob,
`last_updated` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`repository_name`,`message_name`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=4294967295
mysql> select repository_name, count(*) from deadletter group by repository_name;
+-----------------+----------+
| repository_name | count(*) |
+-----------------+----------+
| address-error | 62454 |
| error | 70 |
| spam | 281567 |
+-----------------+----------+
3 rows in set (0.87 sec)
mysql> select repository_name from deadletter where message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (33.45 sec)
mysql> select repository_name from deadletter where repository_name = 'spam' and message_name = 'Mail1139592483352-191';
+-----------------+
| repository_name |
+-----------------+
| spam |
+-----------------+
1 row in set (0.00 sec)
mysql> select repository_name from deadletter where repository_name = 'address-error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)
mysql> select repository_name from deadletter where repository_name = 'error' and message_name = 'Mail1139592483352-191';
Empty set (0.00 sec)
-----------------------------------------------------
So can you anticipate my suggestion? It is simple: When you have a table Q with index (a, b) and WHERE clause containing a condition on b but NOT on a, instead of treating the query as one with no matching indexes, the query optimizer should use the index (a, b) to segment the table into N = COUNT( DISTINCT a ) segments, and perform a UNION of N queries with the appropriate condition on column a prepended to the WHERE clause for each of the N queries. This technique should work equally well if column a is actually a set of columns.
If the number of segments is not much less than the number of rows, then of course this technique will only slow things down, so it should not be used when !( N << row_count ); this is an easy and fast determination for the query optimizer to make, and it would only be tested for when there is no "properly" matching index.
So I think this is a great idea, but would be very surprised if it were a novel one.