MySQL :: Re: Optimizing a query so that the smaller table is handled first

New Topic

Re: Optimizing a query so that the smaller table is handled first

Posted by: Jeff Goldberg
Date: April 07, 2011 12:45PM

This is great feedback and I really appreciate it. We will definitely look into the implications of switching date from a dimension to a simple metric column.

The book "The Data Warehouse Toolkit" by Ralph Kimball is a big proponent of having dates be a dimension. This book isn't the holy word, so I'm not saying this means it's the right thing to do, I'm just saying that's why we went that way.

We do get certain improvements by having a date dimension, and if you have feedback I'd love to hear it.

The query in my original message is not really indicative of the kind of queries we're running when it comes to dates. When dealing with the date dimensions the system usually already has the date ID and so a join isn't necessary; it's a straight int.

One of the wins we get out of this is that date calculations can be sped up. Incrementing a date dimension ID by 1 is also incrementing the date by one day. So when we need to do date differentials in our SQL we use simple addition and subtraction. To get a sale from a week ago, we can simply say "daily_sales_fact.sales_date - 7" rather than do a date calculation on a DATETIME column. This is something we use in many large queries.

I have no comparison at the moment, so I don't know what kind of optimization that really provides, if at all. I'm going to alter one of our staging databases to convert the date dimension to a metric column and then rerun our bigger queries for benchmarking. When that's done I'll post a follow up, though it may be a few weeks before I get to it.

Thanks again!

Other note: In terms of your point on ranges, I discovered that a lot of the SQL in the system is querying for date ranges of the form "WHERE (largetable.date >= X and largetable.date <= Y)". This, as I recently learned, only uses the indexed column for half of that range and then manually scans the table for the other half. I've run test queries as suggested here (http://explainextended.com/2009/10/07/in-list-vs-range-condition-mysql/) and it's more than 10x faster, so I'm going to chase these down and fix them.)

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

Optimizing a query so that the smaller table is handled first

2611

Jeff Goldberg

April 04, 2011 06:06AM

Re: Optimizing a query so that the smaller table is handled first

967

Jeff Goldberg

April 04, 2011 09:33AM

Re: Optimizing a query so that the smaller table is handled first

938

Jørgen Løland

April 04, 2011 12:34PM

Re: Optimizing a query so that the smaller table is handled first

1525

Jeff Goldberg

April 04, 2011 01:27PM

Re: Optimizing a query so that the smaller table is handled first

945

Jørgen Løland

April 05, 2011 08:05AM

Re: Optimizing a query so that the smaller table is handled first

1026

Rick James

April 05, 2011 09:16PM

Re: Optimizing a query so that the smaller table is handled first

922

Jeff Goldberg

April 07, 2011 12:45PM

Re: Optimizing a query so that the smaller table is handled first

787

Rick James

April 07, 2011 07:12PM

Re: Optimizing a query so that the smaller table is handled first

966

Rick James

April 07, 2011 09:34PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.