Re: Killed by joins. 1 minute selects in Access, 15 minutes in mySQL
Posted by:
Mark Daly
Date: July 17, 2011 08:53PM
> These are pretty yucky:
Agreed.
> and fa.PCode like '%CL%'
> and fa.BCode >2
> What are they doing?
Mostly helping me feel comfortable that the new indices are working. Which it turns out they aren't. This is just a test query with fields I expect to end up in a lot of queries. The categorical fields in the fact table are ordered from most to least frequently used.
!Important! - the goal here is not to make this specific query run faster. What I'm trying to achieve is the creation of a set keys that will work on most queries our users put together. This db is for ad hoc data mining, and not for transactions. As a result, very few queries are run more than once or twice, so we can't just rely on a single primary key. It's going to have to a set of indices that cover most permutations of commonly selected fields. If I can't join tables efficiently, then I have to denormalize the whole thing and have a monster table, which is horrible to think about.
Anyway, to your original question:
1. PCode is a descriptive label we apply to each record. Sometimes these labels can be grouped together by substrings that are common, but these groupings are fluid and can change from query-to-query, so there's no point in setting up a reference table for them. I expect a lot of LIKE and IN statements on this field.
2. BCode is much less exotic. These are normalized IDs with their own dimension table . Roughly 100+ unique values
> But SymbolID is not in the table!? I can't analyze it with inconsistencies like this.
I've corrected the original post. I'm not completely comfortable posting production code on a public forum, so you'll have to excuse the odd typo since there's a lot of manual touch-ups going on.
> Part of the issue comes from not being able to use the PRIMARY KEY
As I mentioned above, queries will seldomly be reused, so we won't be able to rely on the primary key very often. Is there really a huge performance difference between PRIMARY KEY, UNIQUE KEY and KEY?
> plus having Dimension value(s) in the GROUP BY
Ah, I didn't know that had an impact. Interesting.
> Usually Summary tables are the performance salvation.
This IS a monthly summary table. :) The raw data is much, much larger.
> What is the is the output from SELECT COUNT(DISTINCT PCode), COUNT(DISTINCT BCode) FROM fact_amounts;
You saw how my original attempt at BBCode went, so in the interests of making it readable, here's a formatted version of the output.
Explain
---------
id: 1
select_type:simple
table: fact_amounts
type: index
possible_keys: NULL
key: Index 3
key_len: 59
ref: NULL
rows: 8968481
Extra:Using index
Results
---------
"COUNT(DISTINCT cb.PCode)","COUNT(DISTINCT cb.BCode)"
"68","108"
Run time
---------
/* 0 rows affected, 1 rows found. Duration for 1 query: 101.812 sec. */
> SELECT COUNT DISTINCT PCode, BCode FROM fact_tsx_cbilling;
I received syntax erors when I ran this. I see what you're aiming for, but I don't know how to do it without significantly changing the structure of the query and using a nested design. That would make the EXPLAIN results useless though.
Is that sufficient information to describe the problem?
Edited 4 time(s). Last edit at 07/17/2011 08:59PM by Mark Daly.