Wednesday, July 13, 2011

Netflix architecture talk at CasandraSF 2011

The talk is done by Adrian cockcroft.

Netflix has about 23 million subscribers and gets about 20B requests per month. Runs almost completely  in the cloud. Following are some reasons for using the Cloud.
  • Frictionless deployment  - do not need to order and wait. Remove many types of waits. 
  • Better business agility - that is can run the whole architecture for new zone within few hours. So they can act very fast. 
  • Cannot build data center fast enough to cater for the demand - data center takes at least 9 months to build. 
  • Support for  Zones, scale on demand, global deployment 

They keep movies separate, and distribute them through a CDN. Keep most of dynamic (transaction) data in Cassandra. Do not use EBS, but they back up data in Cassandra nodes periodically. Take independent backups at each Cassandra node, with loose synchronization. If there is any inconsistencies across different node backups, Cassandra repairs it at start up. They also  incrementally backup data to S3 to avoid losing data between backups. (presumably through callbacks that are called when SSTables in Cassandra are updated.)

They can point their system to a S3 bucket and bootup the system, and initializer will load and initialize the system using the data in that S3 bucket. They use Chaos monkey testing. In other words, a process goes and randomly kills nodes, but the system recovers. They do this often to make sure that recovery does work.

No comments: