Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?
Posted by: Rick James
Date: November 07, 2010 11:42PM

Slight error in the suggested Summary table
INDEX(year, country_origin, country_dest)
-->
INDEX(country_origin, country_dest, year)

In reality you might need both indexes. Those indexes would be cheap (several MB) because of how small the table is.

Eh? Your 300M rows has grown to 55G?

Suggest you look at
http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_myisam_data_pointer_size
and tune that also. 55G _Fixed_ width rows can live with data-pointer-size = 5 (1T row max). That would save you 55GB per index on the raw (Fact) table versus the default value of 6.

If value is Dollars, then a FLOAT can exactly represent up to 16 million dollars and can represent $16B to a precision of $1000. In doing summations, roundoffs tend to cancel out each other. (You could consider using DOUBLE in the summary tables to avoid some of the rounding errors.) In any case, I suggest you display results as 000s or millions.

To explain "gallingly inefficient" -- In MyISAM, an index is a BTree where the leaf rows contain
1. The key(s) in the index
2. The data pointer (see link above)
So, your original code was mostly 4+6 bytes per row per index. With the improvements, it will be 1+5 or 2+5 bytes. Multiply that by 55G, then increase it by 1% to 45% for BTree overhead. (Of course, this is moot if you get rid of the indexes.)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.