MySQL Forums
Forum List  »  General

Large Table structure recommendations
Posted by: Michael Cook
Date: February 20, 2020 08:39AM

Hey all, I was hoping for some advice on how best to structure (and possibly index) a fairly large table I've been trying to build.

The data I wish to store is quite simple:
Basically, it is an hourly forecast published ~185 times for a particular date with 88 integer values. In other words, for any given day, there will be 24 * 88 = 2112 integers reported, and that report will be revised ~185 times, for a total of ~391k values per day.

By and large, I would be selecting the latest hourly value for a set of dates for a subset of the 88 parameters.

As it stands now, I have the table laid out as follows:
CREATE TABLE `datatable` (
  `field` enum('param1' ... 'param88') NOT NULL,
  `value` mediumint(9) DEFAULT NULL,
  `hour` tinyint(3) UNSIGNED NOT NULL,
  `date` date NOT NULL,
  `published` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Also, as of right now I have the following indices defined:
ALTER TABLE `datatable`
  ADD UNIQUE KEY `ukey` (`field`,`hour`,`date`,`published`) USING BTREE,
  ADD KEY `field` (`field`,`date`),
  ADD KEY `ptsidx` (`published`);

I don't know that that makes perfect sense, but my logic was that I would generally be selecting on a particular field, date and publish time (akin to version)

For my most common query, I'm using something like this:
SELECT
	date,
	AVG(IF(field='param5',value,NULL)) param5,
	AVG(IF(field='param6',value,NULL)) param6,
	AVG(IF(field='param7',value,NULL)) param7,
	AVG(IF(field='param8',value,NULL)) param8,
	AVG(IF(field='param9',value,NULL)) param9,
	AVG(IF(field='param10',value,NULL)) param10,
	AVG(IF(field='param11',value,NULL)) param11
FROM 
	datatable
JOIN
	(
		SELECT hour hour2, date date2, MAX(published) published2
		FROM datatable
		WhourRE
			    field  = 'param5' 
			AND date  >= CURDATE() 
		GROUP BY date, hour
	) sq
ON
	    date = date2 
	AND hour = hour2 
	AND published = published2
WHERE
	    field IN ('param5','param6','param7','param8','param9','param10','param11') 
	AND date >= CURDATE()
GROUP BY date
ORDER BY date

However, that generally takes ~20-25s to run, so I'm hoping to cut it down if possible.

I'm currently using 5.7.10, but will be switching to 8.0.19, which I expect will yield improvements, but I'd rather improve through better structure and queries than updated software...

Thanks for any advice you can provide!

Options: ReplyQuote


Subject
Written By
Posted
Large Table structure recommendations
February 20, 2020 08:39AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.