Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?
Posted by: dan moran
Date: November 04, 2010 08:15PM

We have an 8 dimensional tensor with approximately 300 million elements. The dimensions are country of origin, state of origin, country of destination, state of destination, year, etc.

Currently I have a 9-column table which holds ID values for each dimension and the data point value in the final column. I use CREATE INDEX to speed up lookups into this table.

But I think this is horribly inefficient. The 300 million datapoints at 8 bytes means ~2.2gb of actual data. But the 9-column table storing each data point along with its full coordinates is ~36gb!

My questions for the hive-mind:
(1) This structure seems very inefficient. I only have 200 countries, yet every single row of the table must hold a record for country of origin. How can I avoid storing so much redundant index data?
(2) Is it possible to avoid indexing the data twice, ie. with one set of indices stored explicitly in the table and a second set created by CREATE INDEX?
(3) Do you know of a storage engine (or other DBMS) that is more appropriate for this? Must I write my own storage engine?
(4) Would OLAP help here at all?
(5) As a generalism, it seems to me that SQL is very good for storing sparse matrices, but very bad for storing dense matrices. Is this intuition correct?

We can use custom code to pull slices out of tihs matrix, but it is so lovely to be able to use SQL to query this sucker.

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.