Performance problems querying large data sets
I have problems finding the correct approach to query statistics for a quite large data set.
The simplified table setup is like this:
Items: ~100 million rows
- id
- user_id
- name
- property1
- property2
- ...
- property10
Statistics: ~2 billion rows
- item_id
- date
- value1
- value2
- ...
- value10
I want now build a query based on the items table, that can filter and sort both by the various item properties, and the collected statistics.
A "normal" approach looked like this:
SELECT *
FROM items
LEFT JOIN statistics on items.id=statistics.item_id AND statistics.date="2022-01-01"
WHERE user_id=123
AND items.property3="foo" AND items.property5="bar"
ORDER BY statistics.value1
Now I'm a bit lost how to approach this. I can't create indexes for every combination of filters? And when the filters or sorting are applied on the (left) joined table, nothing seems to work.
How could you handle such a large data set?
Is denormalization maybe a good approach?
And/or putting single indexes on every used column?
Or is that just beyond the limits of what you should do with mysql, and use a different technology?
The DB instance is very big already, over 200GB RAM, so there is not much room to improve hardware side.
The date range for the statistics table is already cut down significantly (only the last few months instead years).
The problem also gets a lot worse when I want to consider date ranges (using SUM on values, GROUP on items and HAVING for filters), but the above example is already enough to result in timeouts.
Thanks for any hints!
Subject
Views
Written By
Posted
Performance problems querying large data sets
526
November 24, 2022 01:38PM
Sorry, you can't reply to this topic. It has been closed.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.