MySQL :: Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

New Topic

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

Posted by: Rick James
Date: November 07, 2010 11:19PM

That sounds like an excellent use case for Summary Tables. There would probably be several such tables.

For the sample query this would be excellent; variants, including the first two you mention, would be ok:

CREATE TABLE ByOriginDest (
    year TINYINT UNSIGNED NOT NULL,
    country_origin TINYINT UNSIGNED NOT NULL,
    country_dest TINYINT UNSIGNED NOT NULL,
    ct INT UNSIGNED NOT NULL,   -- from COUNT(*)
    tot FLOAT NOT NULL,   -- from SUM(value); get average from SUM(tot)/SUM(ct)
    INDEX(year, country_origin, country_dest)
);

That table would also let you get totals by origin (across all dests), dests (across all origins), with or without GROUP BY year.
With only 200*200*3 rows (120K), even a table scan would be comparatively fast.

The INDEX could be a PRIMARY KEY, but (depending on how you incrementally populate the table) it may be simpler to INSERT dup keys instead of doing UPDATEs.

- Sum of all trade between High Income Countries and Low Income Countries --
This would involve categorizing country_origin/dest perhaps via JOINs to another table, something like:

SELECT  SUM(tot) AS AllTrade
    FROM ByOriginDest b
    JOIN Countries o  ON o.country = b.country_origin AND o.income >  1234567
    JOIN Countries d  ON d.country = b.country_dest   AND d.income <= 1234567

That would ignore any indexes on b and do a table scan, but the JOINs would be moderately efficient. That's about 360K rows hit, versus probably a 1B row hit without the summary table.

Oops, maybe you need to do that twice, once for high->low and once low->high, and then sum the subtotals. Still 720K beats 1B.

SELECT
    ( SELECT ... ) +   -- high->low
    ( SELECT ... );    -- low->high

- Value of iron ore exports from Chinese Industries to US Government Purchases -- This needs a different Summary Table.

Meanwhile, the main table does not need any indexes. Well, I recommend an AUTO_INCREMENT so you can keep track of where the summarization "left off".

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

dan moran

November 04, 2010 08:15PM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

Rick James

November 07, 2010 06:54PM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

dan moran

November 07, 2010 10:04PM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

Rick James

November 07, 2010 11:19PM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

Rick James

November 07, 2010 11:42PM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

dan moran

November 08, 2010 12:13AM

Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?

Rick James

November 09, 2010 12:11AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.