MySQL Forums
Forum List  »  Performance

Re: Optimizing a query so that the smaller table is handled first
Posted by: Jeff Goldberg
Date: April 07, 2011 12:45PM

This is great feedback and I really appreciate it. We will definitely look into the implications of switching date from a dimension to a simple metric column.

The book "The Data Warehouse Toolkit" by Ralph Kimball is a big proponent of having dates be a dimension. This book isn't the holy word, so I'm not saying this means it's the right thing to do, I'm just saying that's why we went that way.

We do get certain improvements by having a date dimension, and if you have feedback I'd love to hear it.

The query in my original message is not really indicative of the kind of queries we're running when it comes to dates. When dealing with the date dimensions the system usually already has the date ID and so a join isn't necessary; it's a straight int.

One of the wins we get out of this is that date calculations can be sped up. Incrementing a date dimension ID by 1 is also incrementing the date by one day. So when we need to do date differentials in our SQL we use simple addition and subtraction. To get a sale from a week ago, we can simply say "daily_sales_fact.sales_date - 7" rather than do a date calculation on a DATETIME column. This is something we use in many large queries.

I have no comparison at the moment, so I don't know what kind of optimization that really provides, if at all. I'm going to alter one of our staging databases to convert the date dimension to a metric column and then rerun our bigger queries for benchmarking. When that's done I'll post a follow up, though it may be a few weeks before I get to it.

Thanks again!

Other note: In terms of your point on ranges, I discovered that a lot of the SQL in the system is querying for date ranges of the form "WHERE (largetable.date >= X and largetable.date <= Y)". This, as I recently learned, only uses the indexed column for half of that range and then manually scans the table for the other half. I've run test queries as suggested here (http://explainextended.com/2009/10/07/in-list-vs-range-condition-mysql/) and it's more than 10x faster, so I'm going to chase these down and fix them.)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.