MySQL Forums
Forum List  »  Performance

Optimizing a query so that the smaller table is handled first
Posted by: Jeff Goldberg
Date: April 04, 2011 06:06AM

I've got a simple query that isn't behaving the way I expected and I'm hoping somebody can help me. It may be that I'm doing it wrong, but it also may be that I misunderstood how MySQL will optimize queries for me.

My simple but problematic query is follows:

>
SELECT SUM(d.units_sold) TOTAL_UNITS_SOLD
FROM date_dimension dd, daily_sales_fact d
WHERE dd.dt = '2011-04-02' AND d.sales_date = dd.date_dimension_id;
>


This query is summing all unit sales for a given date. My expectation was that MySQL would first prune the date_dimension down to one row based on the first part of the WHERE clause:
> dd.dt = '2011-04-02'


But instead it's first joining the daily_sales_fact table with the date_dimension and, I guess, only then is it pruning out all daily_sales_fact rows that don't match the date.

The "dt" column is a non-indexed DATETIME column. I realize this is going to be a column that doesn't perform as well, but the date_dimension is pretty small, so I didn't think it would be an issue.

The EXPLAIN for the query is as follows:
>
id, select_type, table, type, possible_keys, key, ken_len, ref, rows, Extra
1, SIMPLE, d, ALL, PRIMARY,fk_daily_sales_sales_date1, , , , 5855892,
1, SIMPLE, dd, eq_ref, PRIMARY, PRIMARY, 4, dg.d.sales_date, 1, Using where
>

If I use the primary key of the date_dimension, everything works the way I'd expect, with the date_dimension WHERE clause happening first.

>
SELECT SUM(d.units_sold) TOTAL_UNITS_SOLD
FROM date_dimension dd, daily_sales_fact d
WHERE dd.date_dimension_id = 2747 AND d.sales_date = dd.date_dimension_id;
>

The EXPLAIN:
>
id, select_type, table, type, possible_keys, key, ken_len, ref, rows, Extra
1, SIMPLE, dd, const, PRIMARY, PRIMARY, 4, const, 1, Using index
1, SIMPLE, d, range, PRIMARY,fk_daily_sales_sales_date1, PRIMARY, 4, , 136850, Using where
>

(In this second case, I could have obviously just skipped the join altogether and put the primary key directly into the "d.sales_date = 2747" but I'm trying to optimize the first query, not the second.)

I've tried fixing the first query using explicit JOIN statements rather than the WHERE clause, but I'm still getting the same poor performance results.

Is there a way to restructure the first query so that it helps/forces MySQL to first limit the date_dimension table and only then join it to the daily_sales_fact table?

Do I have to index the "date_dimension.dt" column for this to work?

I could break this into two queries so I can first get the primary key myself and only then run the second query. But if it's that straightforward for me to do it in two queries, I'm hoping it's just a gentle nudge to get MySQL to do something similar in one query.

Thank you very much for any help you can provide.

Jeff

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.