Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?
Posted by:
Rick James
Date: November 07, 2010 06:54PM
Guess what? An INDEX stores redundant data. Granted, the storing of it is automatic, so it is not as bad as maintaining the redundant data yourself.
8 bytes? DOUBLE? FLOAT (4 bytes) is not precise enough?
Don't write your own storage engine. That would be a lot of work, and you might not achieve much.
I can't say much more without getting a better feel for the SELECTs you need to apply.
Also, please provide SHOW CREATE TABLE; this will give me some concrete names to talk around.
"with its full coordinates is ~36gb" -- That's an average of 15 bytes per dimension! Normalize! You might get down to 7GB.
The YEAR datatype is 2 bytes (no normalization needed). You could shrink that to 1 byte if you can be sure you have a range of under 255 years, say, by subtracting 1900. That could still allow 'range' queries (like: WHERE year BETWEEN 2003-1900 AND 2010-1900; do _not_ do WHERE year+1900 BETWEEN 2003 AND 2010).
Country could at least be the 2-byte Intl codes, or possibly a 1-byte number plus a lookup table. If "state" is a subdivision of "country", then these could be combined, thereby eliminating dimension(s). On the other hand, if you need to 'report' on a 'country' or report on a 'state', then arrange for the state-country pairs to be consecutive within a country. Maybe something 'simple' like (assuming no more than 300 states in a country):
country_id * 300 + state_id
and store the pair in a SMALLINT UNSIGNED (2 bytes (0..65535) for both dimensions).
Are any other dimensions not 'independent'?
Indexing... Having lots of indexes may not be worthwhile. Again, let's see the SELECTs before commenting.