Big table (too big?) how to handle?
I’m looking for some experience MySQL advice regarding a project.
Background:
I have a MySQL server which receives data from about 40 weather stations in the US and world. Most stations upload 1x per minute. That’s about 525,000 records per station per year.
There have been debates over the years as my software engineers (mainly Computer Science students) come and go, on the best way to handle and archive that much data. Originally, each station’s raw data was in it’s own table. For reasons I have forgotten, the last programmers put it all in one big table.
That table has 28 columns of weather parameters, and as of this writing, now has 66,135,065 rows in it. Some stations have been uploading data since 2008.
Up until now I have also benefitted from comped server hosting in my employer's local colo rack. I will be shortly losing that bennie, which as led me down this path.
I need to figure out, once and for all, the proper thing to do with this database as I clean thing up and make the move to a paid host.
That table is too big to move over the Internet in any reasonable length of time.
Random thoughts:
While most clients like to see their data updated quickly, I don’t suppose I really need 1 minute data indefinitely. The database could be purged to, say every 10 minutes and preserve the flavor of it. It might be useful to intelligently filer it to retain interesting records. 10 minutes is a standard length of time over which wind gusts are measured.
There is already a separate table of station daily summary data. It has 40,298 rows in it.
There is a separate table just for a few stations that send more than the basic data. It currently has 37,876,061 rows in it. There’s another biggass table to deal with.
Doing anything that locks those tables takes forever, and keeps stations from updating. Some of the newer stations buffer data until the server is available, some just lose data.
So, the question is:
What’re the best practices for this situation?
What should I do to make this a better system, easier and quicker to query, and make it easier to move to a paid host?
Feel free to ask any thing I can clarify. I probably left something out, and this post is getting long.
Thanks!
Chris