Regression: Query optimized in 4.1, not at all in 5.0, why?
I have a pair of large (>10,000,000 rows) tables that look approximately like:
CREATE TABLE a (
id INT8 PRIMARY KEY,
...);
CREATE TABLE b (
a_id INT8,
c_id INT4,
...,
INDEX(a_id),
INDEX(c_id, a_id)
);
A common query of mine is:
SELECT * FROM a, b WHERE c_id = ? AND a_id > ? AND b.a_id = a.id;
In 4.1, the optimizer worked great. It would use the c_id and a_id values together to make a "range" index query on b, then join that with a by the primary key. Quite fast! Usually under 1,000 matching rows in table b, each with exactly one matching row from table a.
But, incredibly, 5.0 instead does a *full table scan* on a (all 10,000,000+ rows!), then joins that with b. Huh? Why?
To make it stranger, I found that if I change my query to read "SELECT * FROM b, a", then 5.0 optimizes correctly!!! Why would the order that tables are listed in the query affect the optimizer? It makes no sense.
Now that I know this, my performance is OK again, but is this a bug in MySQL 5.0? Or is there some reason that the database is now being forced to join tables in the order that there are listed in the FROM clause?
Any info would be really appreciated. Thanks.
Edited 1 time(s). Last edit at 06/21/2006 12:40AM by William Shubert.