MySQL Forums
Forum List  »  General

Joining a distinct list with a single row
Posted by: Jeffrey Massung
Date: April 25, 2018 03:51PM

I have an extremely large table (2B+ rows) of genomic data with an index that provides me with a couple pieces of information quite quickly for their individual queries. For example:

select distinct phenotype from G

That returns ~100 phenotypes in ~ 0.05 sec. I can also find a single row from the table within a given range in the genome for a phenotype. For example:

select * from G where phenotype=? and chromosome=? and position between ? and ? order by p_value limit 1

This also does well, returning the exact row I care about in ~ 0.05 sec.

What I would like is a query that allows me to combine these two together: for each distinct phenotype, find the best row in a given range. But no matter what I do, I can't seem to combine them efficiently.

If I hand-unroll the distinct list of phenotypes and do ~100 UNION ALL statements together in a single query, I get the answer I'm looking for in < 2 seconds. But, those all perform serially and I'd prefer to not do that in code (first getting the unique phenotype list and then issuing 100 selects in parallel only to merge them together myself).

Just about every query combination I've tried of joins or correlated sub-queries ends up causing either a full table scan or extremely large temporary tables created. The simplest form of the end result I'd look for is:

select phenotype, min(p_value) from G
where chromosome = ? and position between ? and ?
group by phenotype;

Except that can take considerably longer (even when indexed due to the size of the data). It also requires another join afterwards to get at the original rows data, which I'd like to avoid if possible.

Any help with this much appreciated!

Options: ReplyQuote


Subject
Written By
Posted
Joining a distinct list with a single row
April 25, 2018 03:51PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.