MySQL Forums
Forum List  »  Performance

Performance problems querying large data sets
Posted by: Patrick Schiel
Date: November 24, 2022 01:38PM

I have problems finding the correct approach to query statistics for a quite large data set.

The simplified table setup is like this:

Items: ~100 million rows
- id
- user_id
- name
- property1
- property2
- ...
- property10

Statistics: ~2 billion rows
- item_id
- date
- value1
- value2
- ...
- value10

I want now build a query based on the items table, that can filter and sort both by the various item properties, and the collected statistics.

A "normal" approach looked like this:

FROM items
LEFT JOIN statistics on AND"2022-01-01"
WHERE user_id=123
AND items.property3="foo" AND items.property5="bar"
ORDER BY statistics.value1

Now I'm a bit lost how to approach this. I can't create indexes for every combination of filters? And when the filters or sorting are applied on the (left) joined table, nothing seems to work.

How could you handle such a large data set?
Is denormalization maybe a good approach?
And/or putting single indexes on every used column?
Or is that just beyond the limits of what you should do with mysql, and use a different technology?

The DB instance is very big already, over 200GB RAM, so there is not much room to improve hardware side.

The date range for the statistics table is already cut down significantly (only the last few months instead years).

The problem also gets a lot worse when I want to consider date ranges (using SUM on values, GROUP on items and HAVING for filters), but the above example is already enough to result in timeouts.

Thanks for any hints!

Options: ReplyQuote

Written By
Performance problems querying large data sets
November 24, 2022 01:38PM

Sorry, only registered users may post in this forum.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.