Wednesday, June 9, 2010

Amazon and 11 nines

Amazon has claimed 11 nines on Availability . It is very very hard feast to accomplish, and if they have done it (I would love to know how they decided on that number), that is a ground breaking achievement.

To see why, lets see what it means. Availability is measured as MTTR (Mean time to Recovery)/ MTTF (Mean time to Failure) as a percentage. In other words, it is time to recover after a failure, divided by mean time for such a failure happen. Reliability is measured in terms of number of nines in availability. So Amazon S3 will be fail for a second only for every 10^9 seconds, or 10^9/(360*24*60*60) = 32 years!!

On their seminal paper "High Availability Computer Systems", Jim Gary and Daniel Siewiorek defined availability classes, as follows

unmanaged 90.% - 50,000 mins/year downtime
managed 99.% - 5,000 mins/year downtime
well-managed 99.9% - 500 mins/year downtime
fault-tolerant 99.99% - 50 mins/year downtime
high-availability 99.999% - 5 mins/year downtime
very-high-availability 99.9999% - .5 mins/year downtime
ultra-availability 99.99999% - .05 mins/year downtime

As you will notice even they defined only 7 nines. So we do not have a name to call what Amazon has claimed.

No comments: