MySQL Forums
Forum List  »  Archive Storage Engine

MySQL Archive Tablespace for FACTs
Posted by: Edwin DeSouza
Date: April 18, 2007 04:49PM

I’m visiting a Pentaho customer right now whose current “transaction” volume is 200 million rows per day. Relatively speaking, this puts their planned warehouse in the top quintile of size. They will face significant issues with load times, data storage, processing reliability, etc. Kettle is the tool they selected and it is working really well. Distributed record processing using Kettle and a FOSS database is a classic case study for Martens scale out manifesto.

This organization doesn’t have unlimited budget. Specifically, they don’t have a telecom type budget for their telecom like volume of data. One of the issues that has come up with their implementation has been the tradeoff between space, and keeping the base level fact records. For example, at 200 million / day and X bytes per fact you start to get into terabytes of storage quickly. It was assumed, from the start of the project, only summary level data could be stored for any window of time exceeding 10 days or so.

The overall math is sound.

Size per record (S) x number of records per day (N) = size per day of data growth (D)

In this equation, there’s really not much we can do, if we want to keep the actual base level transaction, about the number of records per day. N becomes a fixed parameter in this equation. We do have some control over the S value, which is mostly about what this blog is about.

Can we reduce the size of S by such an amount that D becomes more realistic? The answer is the ARCHIVE engine in MySQL.

The Archive table uses an order of magnitude LESS space to store the same set of FACTS. What about query performance? I’ll have to do another blog on a more scientific approach but the anecdotal query performance on typical OLAP queries (select sum() from fact group by dim1_id) seemed related (less than a 15% difference).


Options: ReplyQuote

Written By
MySQL Archive Tablespace for FACTs
April 18, 2007 04:49PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.