MySQL Forums
Forum List  »  General

Re: trillion records?
Posted by: Rick James
Date: September 07, 2011 09:18AM

>> each with a lot of RAM; how's your budget?) 100TB
>> / 256GB/machine = 400 machines would be needed to
>> cache everything. (Because of overhead, etc,

> I am unable to understand why there is need of 400 machines with 100TB of disk space?

(Note: I was assuming that everything was cached in RAM.)

> Sharding doesn't seem to be that much complicated.
Well, it depends. If you have effectively one huge table, and it is easy to split on some key, and you don't need to do JOINs, and you don't need to scan a range of rows that would lead to scanning all shards, then sharding is simple.

Even if your data does not match all those 'requirements', sharding may be possible, but it may be complicated. Would you care to SHOW CREATE TABLE ? Note that the SELECTs are quite important in analyzing how easy/hard sharding will be. Fetching a single row is easy if you know which shard to talk to. Fetching a range or rows is easy if you know they are all in the same shard. Once you need to talk to multiple shards, you have to write special code. It boils down to deciding how to split the rows among the shards.

NDB Cluster effectively does a form of sharding. MyISAM and InnoDB have no builtin support sharding.

> at least 50 disks should be employed so it will pretty much relax the load on disk I/O operations.
Even so, it would take a year to do a table scan (assuming a single machine). This leads to seriously considering sharding.

A table scan that needs a temp table -- yikes! I don't want to think about the extra disk space and I/O time to do that.

NDB cluster is free with MySQL. I do not have a feel for how scalable it is; maybe I can ask around. (Note: Cluster, InnoDB, and MyISAM are totally different implementations of rdbms. They each have their own pluses and minuses and limitation.)

> If 100Byte is given to each row then its around 93TB of data and I think I must design 400TB solution employing multiple machines.. but how many machine?
How big a disk(s) is practical in a machine you can afford? Putting several big drives on machine behind a RAID-5 controller might be the "sweet spot". Leave some spare room on the machine (20%?). Then calculate how many machines you need. Then 2x for backup.

Another thing to ponder... What if you lost the data on a shard? Would that be disastrous? Or just a minor perturbation in the data? That is, is the data so valuable that you really need to keep a second copy just in case there is a hardware failure?

Options: ReplyQuote


Subject
Written By
Posted
August 05, 2011 01:18AM
August 05, 2011 07:51AM
August 07, 2011 02:25AM
August 07, 2011 05:58PM
September 06, 2011 11:06AM
Re: trillion records?
September 07, 2011 09:18AM
August 09, 2011 01:01PM
August 09, 2011 07:57PM
September 06, 2011 10:44AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.