Re: How to scale MySQL Cluster to support PB storage
Posted by:
Rick James
Date: January 16, 2009 09:05PM
I assume you have looked into VIPs, AkaDNS, and other load balancing and DNS-like routing.
I assume your dual-masters will be in separate colocations (for protection against colo-failure).
You cannot get automated failover, so plan on monitoring, paging, and quick response.
For assuring more up time, you should have a slave hanging of each of the paired masters. Consider this nasty: One master fails and cannot be recovered. This forces you to rebuild the master from somewhere; but where? The simplest is to take down the other master and copy its data. With slave(s), you can, instead, take one of them down.
Or, if you work out a shard migration plan, use it to recover from an unrecoverable master.
Cluster vs non-cluster: I don't know. But do remember that cluster has some restrictions that may get in your way. Be sure to check out the replication options.
May I ask what kind of data and clients you have?
Apache -- good.
Db and storage as two separate layers? I gather you are assuming Cluster.
Sharding tools -- no opinion. Some may be too general to sustain "several thousand requests per second".
How to shard --
* hash of id
* "dictionary", that is a table where you look up which machine it is on
* in between -- hash to a 16-bit value, then do dictionary lookup.
The first option makes moving data between shards difficult, and makes it difficult to add a shard. The others have various tradeoffs between maintenance effort, downtime while moving, etc.
Consider turning the primary key into a BIGINT, possibly via the sharding dictionary?
Any estimate of how long before it is live?
I'm interested in hearing how it turns out.
Consider giving a talk at the MySQL Conference (usually in April). And maybe meet with the astronomer guy and his proposed 15PB system. :)