Re: Database design for monitoring tool
Posted by: Rick James
Date: May 16, 2012 08:00AM

> GROUP BY YEAR(dateTime),MONTH(dateTime), DAY(dateTime), HOUR(dateTime)
--> GROUP BY LEFT(dateTime, 13)

How many rows come in every 5 minutes? Sounds like 1??

Have columns for COUNTs, SUMs, etc. Compute average() via SUM(sum)/SUM(ct).

If it adds up to more than a million rows, consider PARTITIONing the data on months (PARTITION by RANGE, use TO_DAYS()). That will make purging old data easier. (Don't worry about setting up PARTITIONing until you have, say, a month's worth of data.)

> dateTime BETWEEN x AND y
Watch out for the "midnight bug".
WHERE dateTime BETWEEN '2012-08-01' AND '2012-11-01'
contains midnight of Feb 1. Recommend this coding pattern:
WHERE dateTime >= '2012-08-01'
AND dateTime < '2012-08-01' + INTERVAL 3 MONTH

> (=4*120*12*nrYears) tables
NO!

> some tables with aggregation data for the cases where I'd like to plot several months.
That depends. If the tables are not to big, then aggregating on the fly is fine. Otherwise, I would suggest aggregating by day since that is about midway through the spectrum. A 2-year plot could easily come from daily data; a 1-month plot might be ok from daily, more might be tolerable from 5-min data.

Suggest you not worry about the aggregation until you have a month's worth of data. At that point you can experiment with options. Furthermore, rebuilding (which you are likely to do) the aggregate data is pretty cheap over 1 month's data (only 4% of the eventual total).

Be sure to build the aggregate tables incrementally. And consider what happens if the incoming data fails to come in before it is time to aggregate.

> the average Monday of the last 2 years
Do NOT make the mistake of having an index on day-of-week -- the query planner won't use it. It will be cheap enough just to scan all the data and filter out 6/7 of the data.

> nice if the queries finish within a couple of seconds
Easily doable.

As the log files come in, I assume you have some code (Perl?) to process them and do the INSERTs. Suggest you save the logs for the first few days/weeks -- you are likely to find something you forgot, say "ah, s***", then need to start over, reprocessing all the data.

You may find that there is no need for purging data after 2 years.

Use the smallest datatypes practical. Don't use BIGINT when, say, MEDIUMINT is big enough; DOUBLE is probably never needed; normalize; etc. More tips in
http://mysql.rjweb.org/doc.php/ricksrots

Timeline -- what to focus on when
1. Ingestion of the data
2. Building graphs
3. UI -- to let others try your graphs
4. Check your INDEXes
5. Aggregation (for performance)
6. PARTITIONing (for purging; probably not useful for any other performance issues)
Be prepared to iterate, even back to step 1, as you discover issues.

Options: ReplyQuote


Subject
Written By
Posted
Re: Database design for monitoring tool
May 16, 2012 08:00AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.