Best Practices for Many Millions of Rows
Posted by: Darren Brooks
Date: October 15, 2016 12:30AM

Our web application will be quite a large scale affair, with potentially many thousands of users. Each of our clients is a health care clinic (we are focused on small business natural health clinics). Now, as this is our first large-scale project, as the developer, I am considering a couple options for managing the database:

1) When clients sign up to use our application, one option could be to automate the creation of several new tables referred to by their account id (ie. patients_1, etc.). 

2) The alternative to thousands of client-specific tables, are several tables which grow fairly quickly into millions of rows. Of course, I’d rather follow this option, as too many tables is difficult to manage, not to mention all the memory needs it takes up (thoughts on this would be awesome, as well). However, how fast these tables may grow concerns me.

My question:

My question is not necessarily about table access speed (I can set up proper indexes and I’m looking into the possible benefits of partitioning some tables by certain data ranges), but about how large a table can get before I need to be doing some more complex techniques (ie. things I know little about) like separating the database into separate servers, etc.

Here’s some numbers about how we anticipate the database to grow:

1) A table with encrypted patient data holds, for each of our clients, 3K-5K rows of data. Some clinics will have less patients yet this number range is good for our high-end scenario - good to be prepared. I have determined that this amount of data results in about 8MB of disk space (using ~2KB per table row with an average of 4K rows). So, 1,000 customers results in about 8GB of disk space just for this one table (4 million rows). 

2) Then, we expect yearly growth for each of our customers of about 500-1000 new patients = 2MB per client . For 1,000 customers that is 2GB more disk space per year. So, for just 1,000 clients, this one table could grow by up to 1 million rows each year.

3) Another table which keeps patient ledger data will grow, for 1,000 clients, about 1 million rows per year.

4) Since this is a HIPAA-compliant application, another table tracks the actions of users as they login/logout and access patient information. Even minimizing what this table will need to track to be "HIPAA approved", my latest estimate is about, for 1,000 clients, 1 million rows each year. So, not too bad, but it weighs in to the overall disk size.

Now again, as this is our first large-scale project, I don’t know how to judge if all these rows are too large or not. I do know that MySQL can handle many millions of rows, but my real concern is at what point in your experience will we need to set up things differently with the database and our web host's servers? And given our need to keep things as simple as possible (as we are a small startup company and I don't wish to spend all my time managing things), what options should we consider?

Options: ReplyQuote


Subject
Written By
Posted
Best Practices for Many Millions of Rows
October 15, 2016 12:30AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.