MySQL Forums
Forum List  »  Performance

covariance calculation - need help optimizing query
Posted by: Harsha Nagesh
Date: September 06, 2005 10:41AM

Hi,

I have time series data for about 500 entities and I am trying to evaluate its covariance matrix. The data is in a table (Observations)with the following format

name, char(20)
observationType, enum {'daily','weekly','monthly'}
date, date
values, double
mean1, double
mean2, double
covar, double

The query is

select avg(a.values) as mean1, avg(b.values) as mean2,
avg(a.values * b.values) - avg(a.values) * avg(b.values) as covar
max(a.date) as timeStamp
from Observations a, Obseravtions b where
STRCMP(a.name,b.name)<=0 and
a.date=b.date and
a.observationType=b.ObservationType
group by a.name, b.name

With about a million records in the Observations table, it takes about
2 hours to run this query and it results in about 125,000 rows.
There are about 500 (=485) unique names for the column name. The
condition STRCMP(a.name,b.name)<=0 is used as covariance(m,n) is the
same as covariance(n,m). This is on a 1.3GHz intel machine.

I have a joint index on observationType and date.
Explain table says that it does use those indices and for table a
it gives about 200K rows and for table b it gievs about 300 rows.

I am wondering if this query could be improved ?
- My hunch is that the STRCMP function may be evaluated for each combination of data point while evaluating avg(a.values * b.values)
and thus maybe slowing down the query.

I am wondering if there are other ways to speed up this query ?

Thanks,
Harsha

Options: ReplyQuote


Subject
Views
Written By
Posted
covariance calculation - need help optimizing query
6595
September 06, 2005 10:41AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.