MySQL Forums
Forum List  »  Newbie

Re: Killed by joins. 1 minute selects in Access, 15 minutes in mySQL
Posted by: Rick James
Date: July 18, 2011 09:22AM

> the filesort and temp table go away. 100 seconds down to 1 second.
Hmmmm--- Are the EXPLAINs different? I would not expect even a factor of 2 difference.

> data warehousing so well established, my expectations were that a solution was out there if I could just be clever enough with indices.
Yeah, but it's as if the DW textbook writers never tried with tables too big to cache in RAM.

> How do Amazon, Google, etc do searches in microseconds with the massive data sets they have?
Lots of shortcuts. Lots of machines. Lots of combining the results from multiple machines.

Ever notice "Results 1-20 out of about 1,340,000"? They never counted the 1.34 million; they estimate. In fact, the probably estimate only once a day.

> BigTable and clustering alone can't account for it.
BigTable is a locked down DB -- you have to jump through DBA hoops to get your app on it.
NDB Clustering (if that is what you are referring to) has limitations. Your billion rows might choke it, too.

> The Kimball books do a good job outlining schemas, but don't say enough about where tables should be capped, or how they should be indexed.
This forum (and the Performance forum, more so) is littered with carcasses.

> Do you know how much more efficient the big players are?
No. But, I would venture to say that 90% of the queries run as faster or faster on MySQL. MySQL optimizes finding a query plan that is very likely to be optimal, then does it. Others spend a lot of extra time trying to find some clever way to optimize.
Notably, others are better at subqueries. I often say "Avoid 'IN(SELECT...)'; turn it into a JOIN", "Turn that subquery into a JOIN", and (in certain cases) "Turn that JOIN into a subquery 'FROM(SELECT...)'". Others do these tricks for you.
MySQL recently added "index merge", wherein it would use two indexes to perform a query. I have yet to find a case where a better compound index wouldn't do a better job.

> Now I realize I need to think about caching as well.
In DW, "Count the disk hits!" Nothing else matters. Visualize how many disk blocks the query has to read. Then think about
* better indexes
* rearranging the data so that many rows are in each block, rather than being scattered
* not reading for data, only to filter it out later
* shrinking the data (TINYINT instead of INT, UNSIGNED, utf8 issues, CHAR vs VARCHAR, etc)

Oh, consider DECIMAL instead of DOUBLE for money fields.

> I've learned a few things from the posts so far.
:)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.