I have following example table t
id| fruit_id | language | fruit | color | sugarlevel
--+-----------------------------------------------
1 | apple_id | english | apple | green | 10%
2 | apple_id | german | apfle | greenx | 10%
3 | apple_id | french | pomme | green | 10%
4 | banana_id | english | banana | yellow | 25%
5 | banana_id | german | bananen | yellow | 25%
6 | grape_id | english | grape | purple | 15%
7 | grape_id | french | raisin | purple | 95%
8 | strawberry_id | english | strawberry| red | 20%
id is unique, fruit_id is a link to another table (fruit). I have the fruitnames in different languages, but the color and sugarlevel is language independant. Of course, this is not normalized, but this is the data I need to handle.
Constraints: there can be many different languages (and thus "fruit" translations), but they always have the same color and sugar level.
I want to find the row with id 1-2-3 and 6-7, because in row 2 the color is "greenx" (and they should all be "green" or "greenx", no mixture), and because in row 6 and 7 the sugar level is not matching.
I could group by fruit_id, color and sugerlevel, use that in a subquery like this:
select fruit_id
from
(
select fruit_id
from t
group by fruit_id, color, sugarlevel
) as subquery
group by fruit_id
having count(fruit_id) > 1
This takes about 20 seconds for 500k rows, but it feels so ugly that I know there should be a cleaner way.
Any hints?