Hey all, I was hoping for some advice on how best to structure (and possibly index) a fairly large table I've been trying to build.
The data I wish to store is quite simple:
Basically, it is an hourly forecast published ~185 times for a particular date with 88 integer values. In other words, for any given day, there will be 24 * 88 = 2112 integers reported, and that report will be revised ~185 times, for a total of ~391k values per day.
By and large, I would be selecting the latest hourly value for a set of dates for a subset of the 88 parameters.
As it stands now, I have the table laid out as follows:
CREATE TABLE `datatable` (
`field` enum('param1' ... 'param88') NOT NULL,
`value` mediumint(9) DEFAULT NULL,
`hour` tinyint(3) UNSIGNED NOT NULL,
`date` date NOT NULL,
`published` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Also, as of right now I have the following indices defined:
ALTER TABLE `datatable`
ADD UNIQUE KEY `ukey` (`field`,`hour`,`date`,`published`) USING BTREE,
ADD KEY `field` (`field`,`date`),
ADD KEY `ptsidx` (`published`);
I don't know that that makes perfect sense, but my logic was that I would generally be selecting on a particular field, date and publish time (akin to version)
For my most common query, I'm using something like this:
SELECT
date,
AVG(IF(field='param5',value,NULL)) param5,
AVG(IF(field='param6',value,NULL)) param6,
AVG(IF(field='param7',value,NULL)) param7,
AVG(IF(field='param8',value,NULL)) param8,
AVG(IF(field='param9',value,NULL)) param9,
AVG(IF(field='param10',value,NULL)) param10,
AVG(IF(field='param11',value,NULL)) param11
FROM
datatable
JOIN
(
SELECT hour hour2, date date2, MAX(published) published2
FROM datatable
WhourRE
field = 'param5'
AND date >= CURDATE()
GROUP BY date, hour
) sq
ON
date = date2
AND hour = hour2
AND published = published2
WHERE
field IN ('param5','param6','param7','param8','param9','param10','param11')
AND date >= CURDATE()
GROUP BY date
ORDER BY date
However, that generally takes ~20-25s to run, so I'm hoping to cut it down if possible.
I'm currently using 5.7.10, but will be switching to 8.0.19, which I expect will yield improvements, but I'd rather improve through better structure and queries than updated software...
Thanks for any advice you can provide!