MySQL Forums
Forum List  »  Newbie

Re: Killed by joins. 1 minute selects in Access, 15 minutes in mySQL
Posted by: Mark Daly
Date: July 17, 2011 08:53PM

> These are pretty yucky:
Agreed.

> and fa.PCode like '%CL%'
> and fa.BCode >2
> What are they doing?
Mostly helping me feel comfortable that the new indices are working. Which it turns out they aren't. This is just a test query with fields I expect to end up in a lot of queries. The categorical fields in the fact table are ordered from most to least frequently used.

!Important! - the goal here is not to make this specific query run faster. What I'm trying to achieve is the creation of a set keys that will work on most queries our users put together. This db is for ad hoc data mining, and not for transactions. As a result, very few queries are run more than once or twice, so we can't just rely on a single primary key. It's going to have to a set of indices that cover most permutations of commonly selected fields. If I can't join tables efficiently, then I have to denormalize the whole thing and have a monster table, which is horrible to think about.

Anyway, to your original question:
1. PCode is a descriptive label we apply to each record. Sometimes these labels can be grouped together by substrings that are common, but these groupings are fluid and can change from query-to-query, so there's no point in setting up a reference table for them. I expect a lot of LIKE and IN statements on this field.
2. BCode is much less exotic. These are normalized IDs with their own dimension table . Roughly 100+ unique values

> But SymbolID is not in the table!? I can't analyze it with inconsistencies like this.
I've corrected the original post. I'm not completely comfortable posting production code on a public forum, so you'll have to excuse the odd typo since there's a lot of manual touch-ups going on.

> Part of the issue comes from not being able to use the PRIMARY KEY
As I mentioned above, queries will seldomly be reused, so we won't be able to rely on the primary key very often. Is there really a huge performance difference between PRIMARY KEY, UNIQUE KEY and KEY?

> plus having Dimension value(s) in the GROUP BY
Ah, I didn't know that had an impact. Interesting.

> Usually Summary tables are the performance salvation.
This IS a monthly summary table. :) The raw data is much, much larger.


> What is the is the output from SELECT COUNT(DISTINCT PCode), COUNT(DISTINCT BCode) FROM fact_amounts;
You saw how my original attempt at BBCode went, so in the interests of making it readable, here's a formatted version of the output.

Explain
---------
id: 1
select_type:simple
table: fact_amounts
type: index
possible_keys: NULL
key: Index 3
key_len: 59
ref: NULL
rows: 8968481
Extra:Using index

Results
---------
"COUNT(DISTINCT cb.PCode)","COUNT(DISTINCT cb.BCode)"
"68","108"

Run time
---------
/* 0 rows affected, 1 rows found. Duration for 1 query: 101.812 sec. */


> SELECT COUNT DISTINCT PCode, BCode FROM fact_tsx_cbilling;
I received syntax erors when I ran this. I see what you're aiming for, but I don't know how to do it without significantly changing the structure of the query and using a nested design. That would make the EXPLAIN results useless though.

Is that sufficient information to describe the problem?



Edited 4 time(s). Last edit at 07/17/2011 08:59PM by Mark Daly.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.