Re: MySQL Partition Feature or Sharding
Posted by: Rick James
Date: November 07, 2010 07:07PM

A billion rows is a lot. But, do not partition without a specific reason. Will you be purging old data? What do the SELECTs look like?

Do you have enough disk space to hold all the data? If not then you need to "shard" the data. Again, the SELECTs will determine what to shard on.

When searching, will you sometimes get a million rows in the resultset? This will take minutes, no matter how you structure the data. Generally the web server and/or the browser will time out before that. What will you do then?

Do you need the actual raw data rows? Or are you calculating counts, etc, when doing the queries? This sounds like an excellent case for developing "summary" table(s). Perhaps hourly would be a good frequency. Then have the queries access them wherever possible.

Another issue... How fast are the inserts coming in? A billion row table with one random index cannot be inserted into faster than about 10 million rows a day. This is because of disk speed, not MySQL. Let me see your tentative schema (SHOW CREATE TABLE), and I will see if this applies, and explain further.

(Two different random indexes -> only 5M inserts/day.)

With summary table(s), you can usually avoid having any random indexes, thereby easily get more like 100M inserts/day.

The other way to avoid the disk speed limit is by Sharding. Split the data into 10 shards (10 servers), and distribute the inserts somewhat evenly (dshard by something other than date/time), and you can get up to 10x the INSERT speed. But the code is much more complex.

Options: ReplyQuote


Subject
Written By
Posted
Re: MySQL Partition Feature or Sharding
November 07, 2010 07:07PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.