Re: How to scale MySQL Cluster to support PB storage
Posted by: Rick James
Date: January 16, 2009 12:03AM
PB = Petabyte = 10**15 bytes, right?
Give up on cluster. Plan on manually sharding. That is split the database on some key and spread the connections across multiple machines. You will need two layers of machines:
* client layer (apache?) talking to the users
* database layer
Ok, each shard could be a cluster. But ignoring that for the moment...
If you are going to have 1M+ _simultaneous_ connections you may need 1000 db machines, each with 1000 connections. You may need 10K front end machines. Then you are talking about a zillion TCP connections; this could stress the network.
If you are talking about only hundreds of simultaneous connections, then a few dozen front ends plus a few thousand db machines should work.
Now you are talking about lots of things that can fail. So, you need failover procedures. One way is to have dual-master setups for each shard. This way if (or rather, when) a master fails, you can move to the other master for that piece.
How fast is the data coming in?
BTW, there is an astronomy group working on a 15PB database, but they plan on designing and building it out over a 10-year period. (Each photo is a few GB.) How soon do you need yours up?
Is this standard "data warehousing"? That is, "fact" table, plus normalization tables hanging off it.
Have you designed you summary tables? To do a table scan of 1PB (if you could put it on one machine) would take years.
In sharding, you need to plan for migrating data. This gets tricky, especially if you don't want any down time.
Compress the data wherever possible!
Give me more details, I'll give you more advice.