Re: trillion records?
Posted by:
Rick James
Date: August 09, 2011 07:57PM
> one hundren trillion bytes divided by 1 terrabyte = ~90 TB
Close enough for discussion. Anyway, I don't know if your records will be 100 bytes.
> ds3500
Nice. Since it can handle RAID-5/6/10, you are protected against a single-drive failure, and many cases of multiple drive failures.
RAID-10 cuts the capacity essentially by half. So, the device would have a max around 200TB, with the big, slow, 2TB drives, and all 192 expansion bays populated.
RAID 5 or 6 is more like 20% lost to parity (depends on the details). The capacity here is more like 300TB.
Loss of the device, or power, or the building it is in, or ... -- Those are other failure scenarios to either consider, or sweep under the rug.
Battery-backed cache in the RAID is a must.
For computing performance:
A disk drive can handle 100-200 I/Os per second.
RAID striping -- the striping factor gives you about that much improvement -- IF there is enough parallelism in the application.
SSDs are very expensive, and do not have the capacity, even in the ds3500 to handle your size requirement. But they can get on the order of 1000 I/Os per sec.
After striping (and/or SSDs), the next way to get more _bandwidth_ to disk is by sharding -- spreading the data among multiple machines.
Sharding does NOT provide protection against data loss, unless you explicitly store every record on more than one shard. Sharding is complicated enough; adding redundancy makes it even more complicated.
Sounds like the box, drives, connectors, etc, would cost $50-100K ?