Re: Storing a large, dense tensor (35gb scientific data) -- is mysql appropriate?
That sounds like an excellent use case for Summary Tables. There would probably be several such tables.
For the sample query this would be excellent; variants, including the first two you mention, would be ok:
CREATE TABLE ByOriginDest (
year TINYINT UNSIGNED NOT NULL,
country_origin TINYINT UNSIGNED NOT NULL,
country_dest TINYINT UNSIGNED NOT NULL,
ct INT UNSIGNED NOT NULL, -- from COUNT(*)
tot FLOAT NOT NULL, -- from SUM(value); get average from SUM(tot)/SUM(ct)
INDEX(year, country_origin, country_dest)
);
That table would also let you get totals by origin (across all dests), dests (across all origins), with or without GROUP BY year.
With only 200*200*3 rows (120K), even a table scan would be comparatively fast.
The INDEX could be a PRIMARY KEY, but (depending on how you incrementally populate the table) it may be simpler to INSERT dup keys instead of doing UPDATEs.
- Sum of all trade between High Income Countries and Low Income Countries --
This would involve categorizing country_origin/dest perhaps via JOINs to another table, something like:
SELECT SUM(tot) AS AllTrade
FROM ByOriginDest b
JOIN Countries o ON o.country = b.country_origin AND o.income > 1234567
JOIN Countries d ON d.country = b.country_dest AND d.income <= 1234567
That would ignore any indexes on b and do a table scan, but the JOINs would be moderately efficient. That's about 360K rows hit, versus probably a 1B row hit without the summary table.
Oops, maybe you need to do that twice, once for high->low and once low->high, and then sum the subtotals. Still 720K beats 1B.
SELECT
( SELECT ... ) + -- high->low
( SELECT ... ); -- low->high
- Value of iron ore exports from Chinese Industries to US Government Purchases -- This needs a different Summary Table.
Meanwhile, the main table does not need any indexes. Well, I recommend an AUTO_INCREMENT so you can keep track of where the summarization "left off".