MySQL Forums
Forum List  »  Data Warehouse

Dimension Table Cleanup
Posted by: Shaun Martinec
Date: December 19, 2016 12:39PM

While we have a decent method of maintaining our fact tables (partitioned by month), we have no way to remove unreferenced data from our dimension tables. Thus, our DB size is always growing and dimension indexes are far too large.

Small sample of data looks like this:

dim_url - 1B rows - 500GB
fact_ranks - 2.8B rows - 300GB
fact_prices - 2.3B rows - 250GB

Currently we are trying to generate intermediate "used dimension keys" tables using a combination of Percona Toolkit Archiver and triggers. The idea being that we can then use the key table to rebuild the dimension with only the rows that are actually still referenced by the fact tables. However, it appears it will take us weeks per dimension table to complete the process.

Is there a better way to handle this? We would prefer to do this online. It seems like there should be an off-the-shelf solution to handle this common issue. Any direction is appreciated, including telling us we are way off the mark.

Options: ReplyQuote

Written By
Dimension Table Cleanup
December 19, 2016 12:39PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.