Re: Aggregate intraday data
Posted by:
Rick James
Date: January 17, 2014 02:25PM
> I should not only create daily tables, but weekly and monthly too
No. Weekly and monthly rollups can be derived from daily info.
> If I should have a data error in an intraday data table, this error will be also in daily, weekly and monthly related tables.
True. If you go back and update an erroneous entry, or add a missing entry, the rollup tables will be wrong. So, the script should be designed to either do 'yesterday' or do a specific date. And it should do INSERT ... ON DUPLICATE KEY UPDATE ... instead of blindly doing INSERT and getting a duplicate key.
> I should keep 216 tables update in real time
What do you mean? Do you mean that you want the high & low for _today_, even before the market closes? Probably the best approach for that is to do something like:
( SELECT ... FROM Daily ... )
UNION ALL
( SELECT ... FROM F WHERE ((just today)) GROUP BY ... );
That is, get old data efficiently from the rollup table; get today's data from the minute-by-minute quotes. Probably each SELECT in that UNION will take about the same amount of time. The overall time will still be _much_ better than getting all the data from F.
Yes:
SELECT open, high, low, close, volume FROM F WHERE `date` >= @ago;
To rollup Daily into W or M, the first `open` is the open; MAX(high) is the high, SUM(volume) is the total volume, etc.
> How to aggregate to 5 minute data
I would probably not build a rollup table for that.
> I have now 54 intraday table
Where does the 54 come from? I hope you are not putting each ticker in a separate table. That is not scalable. I don't see the need form more than 2 tables: F (minute-by-minute) and a Daily rollup. Daily will be about 511x smaller than F, and similarly faster. Sure, a Monthly would be another ~22x smaller and faster, but dynamically using Daily with GROUP BY will probably be fast enough that you won't feel the need for Monthly.
Do the math... How soon will F fill up your disk? What then? If this will be a problem; let's chat about PARTITIONing a huge table on time and using DROP PARTITION.