MySQL Forums
Forum List  »  Newbie

Database design suggestions for a data scraping/warehouse application?
Posted by: Surabhil Reubro
Date: August 26, 2014 10:47PM

Im looking into the database design for a data ware house kind of project which involves large number of inserts daily.The data archives will be further used to generate reports.I will have a list of user's (for example a user set of 2 million ) , for which I need to monitor daily social networking activities associated with them.

For example, let there be a set of 100 users say U1,U2,...,U100

I need to insert their daily status count into my database.

Consider the total status count obtained for a user U1 for period June 30 - July 6, is as follows

June 30 - 99
July 1 - 100
July 2 - 102
July 3 - 102
July 4 - 105
July 5 - 105
July 6 - 107

The database should keep daily status count of each users ,like

For user U1,

July 1- 1 (100-99)
July 2- 2 (102-100)
July 3- 0 (102-102)
July 4- 3 (105-102)
July 5- 0 (105-105)
July 6- 2 (107-105)

Similarly the database should hold archived details of the full set of user's.

And on a later phase , I envision to take aggregate reports out of these data like total points scored on each day,week,month,etc; and to compare it with older data.

I need to start things from the scratch.I am experienced with PHP as a server side script and MYSQL. I am confused on the database side ? Since I need to process about a million insertion daily,what all things should be taken care of ?

I am confused on how to to design a mysql database in this regard ? On which storage engine to be used and design patterns to be followed keeping in my mind the data could later used effectively with aggregate functions.

Currently I envision the DB design with one table storing all the user id's with a foreign key and separate status count table for each day.Does lot of table's could create some overhead?

Does MySQL fits my requirement? 2 million or more DB operations are done every day.How the server and other things are to be considered in this case.


1)The database should handle concurrent inserts,which should enable 1-2 million inserts per day.

Before inserting I suggest to calculate daily status count,i.e the difference today's count with yesterday's.

2)on a later phase,the archives data (collected over past days) is used as a data warehouse and aggregation tasks are to be performed on it.

Comments:

I have read MyISAM is the best choice for data warehousing projects and at the same time heard INNODb excels in many ways.Many have suggested on proper tuning to get it done,I would like to get thoughts on that as well.

Please advise?

Options: ReplyQuote


Subject
Written By
Posted
Database design suggestions for a data scraping/warehouse application?
August 26, 2014 10:47PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.