Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?
Posted by: Rick James
Date: November 07, 2010 11:19PM

That sounds like an excellent use case for Summary Tables. There would probably be several such tables.

For the sample query this would be excellent; variants, including the first two you mention, would be ok:
CREATE TABLE ByOriginDest (
    year TINYINT UNSIGNED NOT NULL,
    country_origin TINYINT UNSIGNED NOT NULL,
    country_dest TINYINT UNSIGNED NOT NULL,
    ct INT UNSIGNED NOT NULL,   -- from COUNT(*)
    tot FLOAT NOT NULL,   -- from SUM(value); get average from SUM(tot)/SUM(ct)
    INDEX(year, country_origin, country_dest)
);
That table would also let you get totals by origin (across all dests), dests (across all origins), with or without GROUP BY year.
With only 200*200*3 rows (120K), even a table scan would be comparatively fast.

The INDEX could be a PRIMARY KEY, but (depending on how you incrementally populate the table) it may be simpler to INSERT dup keys instead of doing UPDATEs.

- Sum of all trade between High Income Countries and Low Income Countries --
This would involve categorizing country_origin/dest perhaps via JOINs to another table, something like:
SELECT  SUM(tot) AS AllTrade
    FROM ByOriginDest b
    JOIN Countries o  ON o.country = b.country_origin AND o.income >  1234567
    JOIN Countries d  ON d.country = b.country_dest   AND d.income <= 1234567
That would ignore any indexes on b and do a table scan, but the JOINs would be moderately efficient. That's about 360K rows hit, versus probably a 1B row hit without the summary table.

Oops, maybe you need to do that twice, once for high->low and once low->high, and then sum the subtotals. Still 720K beats 1B.
SELECT
    ( SELECT ... ) +   -- high->low
    ( SELECT ... );    -- low->high

- Value of iron ore exports from Chinese Industries to US Government Purchases -- This needs a different Summary Table.

Meanwhile, the main table does not need any indexes. Well, I recommend an AUTO_INCREMENT so you can keep track of where the summarization "left off".

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.