MySQL Forums
Forum List  »  General

Re: Skyblog, Friendster, twitter, yahoo mail..... how to design database for huge no. of users and rows
Posted by: Rick James
Date: May 07, 2009 10:25AM

Replication -- Master plus many slaves -- this allows read scaling.

Sharding -- splitting the data among many machines, based on the primary key.
But that assumes you don't need to collect multiple items together.

All of _your_ emails / pictures / blog entries will easily fit on one server. And a thousand others can live on that same server. Another thousand live on another server. When you login, your id is hashed / looked up in a 'dictionary' / or otherwise turned into a machine name. Then your query goes to that machine.

Wikipedia could (probably does not) store all the words beginning with 'a' on one machine, etc. Since a user request is for one word, that is easy to implement.

It gets tricker when things need to be indexed in multiple ways. Consider "movie reviews". You want to see all the reviews _you_ wrote. Or you want to see _everyone's_ reviews about this _one_ movie. Assuming the reviews are too big to fit on one machine, then either they are duplicated, or there is a messy 'gather' process to collect them from all the machines.

Another thing sometimes possible -- NetApps, EMC, (and others) make huge disk systems -- many terabytes. But this is only a stopgap before needing to shard.

A big system has many layers, sometimes like
* "front end" -- many servers dealing with creating HTML.
* Optionally a caching layer (see memcached, squid, etc) -- to soften the load on the back end.
* An API layer -- do some business logic, route to database shard, etc
* Database slaves for read scaling, shards for write scaling.

Each layer has many machines, and is designed such that scaling is done by adding more machines. The machines are all "close" to each other so that the network latency is on the order of a millisecond. (BCP concerns greatly complicate things.)

In between the layers are "load balancers"; these route each request from a 'client' machine (toward the front end) to any of several 'server' machines, usually in a round-robin manner. Small sites use software for this, big ones use hardware VIPs.

Writes complicate matters -- MySQL (at least) requires a single source of truth (the Master).

Also read about "cloud computing", Hadoop, Hive, etc. At a smaller level, MySQL's NDB cluster, Oracle RAC.

Options: ReplyQuote


Subject
Written By
Posted
Re: Skyblog, Friendster, twitter, yahoo mail..... how to design database for huge no. of users and rows
May 07, 2009 10:25AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.