Saturday, October 29, 2011

List of Known Scalable Architecture Templates

For most Architects, "Scale" is the most illusive aspect of software architectures. Not surprisingly, it is also one of the most sort-out goals of todays software design. However, computer scientists do not yet know of a single architecture that can scale for all scenarios. Instead, we design scalable architectures case by case, composing known scalable patterns together and trusting our instincts. Simply put, building a scalable system has become more an art than a science.

We learn art by learning masterpieces, and scale should not be different! In this post I am listing several architectures that are known to be scalable. Often, architects can use those known scalable architectures as patterns to build new scalable architectures.
  1. LB (Load Balancers) + Shared nothing Units - This model includes a set of units that does not share anything with each other fronted with a load balancer that routes incoming messages to a unit based on some criteria (round-robin, based on load etc.). A unit can be a single node or a cluster of tightly coupled nodes. As the Load balancer, users can use DNS round robin, hardware load balancers, or software load balancers. It is also possible to build a hierarchy of load balancers that includes combination of above load balancers. The article, "The Case for Shared Nothing Architecture" by Michael Stonebraker, discusses these architectures
  2. LB + Stateless Nodes + Scalable Storage - Classical Three tire Web architectures follows this model. This model includes several stateless nodes talking to a scalable storage, and a load balancer distributes load among the nodes. In this model, the storage is the limiting factor, but with NoSQL storages, it is possible to build very scalable systems using this model.
  3. Peer to Peer Architectures (Distributed Hash Table (DHT) and Content Addressable Networks (CAN)) - This model provides several classical scalable algorithm, which almost all aspects about the algorithm scaling up logarithmically. Example systems are Chord, Pastry (FreePastry), and CAN. Also several of the NoSQL systems like Cassandra is based on P2P architectures. The article "Looking up data in P2P systems" discuss these models in detail.
  4. Distributed Queues – This model is based on a Queue implementation (FIFO delivery) implemented as a network service. This model has found wide adoption through JMS queues. Often this is used as task queues, and scalable versions of task queues scale out by keeping a hierarchy of queues where lower levels sends jobs to upper levels when they cannot handle the load.
  5. Publish/Subscribe Paradigm - implemented using network publish subscribe brokers that route messages to each other. The classical paper "The many faces of publish/subscribe" describes this model in detail and among examples of this model are NaradaBroker and EventJava.
  6. Gossip and Nature-inspired Architectures - This model follows the idea of gossip in normal life, and the idea is that each node randomly pick and exchange information with follow nodes. Just like in real life, Gossip algorithms spread the information surprisingly fast. Another aspect of this are Biology inspired algorithms. Natural world has remarkable algorithm for coordination and scale. For example, Ants, Folks, Bees etc., are capable of coordinating in scalable manner with minimal communication. These algorithms borrow ideas from such occurrences. The paper "From epidemics to distributed computing" discusses the models.
  7. Map Reduce/ Data flows - First introduced by Google, MapReduce provides a scalable pattern to describe and execute Jobs. Although simple, it is the primary pattern in use for OLAP use cases. Data flows are a more advanced approach to express executions, and projects like Dryad and Pig provide scalable frameworks to execute data flows. The paper,  "MapReduce: Simplified data processing on large clusters"  provides a detailed discussion of the topic. Apache Hadoop is an implementation of this model.
  8. Tree of responsibility - This model breaks the problem recursively and assign to a tree, each parent node delegating work to children nodes. This model is scalable and used within several scalable architectures.
  9. Stream processing - This model is used to process data streams, data that is keep coming. This type of processing is supported through a network for processing nodes. (e.g. Aurora, Twitter Strom, Apache S4)
  10. Scalable Storages – ranges from Databases, NoSQL storages, Service Registries, to File systems. Following article discusses their scalability aspects.
Having said that there are only 3 way to scale: that is distribution, caching, and asynchronous processing. Each of the above architectures uses combinations of those in their implementation. On the other hand, the scalability killer, apart from bad coding, is global coordination. Simply put, any kind of global coordination will limit the scalability of your system. Each of aforementioned architectures has managed to achieve local coordination instead of global coordination.

However, combining them to create a scalable architecture is not at all trivial undertaking. Unless you are trying to discover new scalable pattern, often it is a good idea to solve problems by adopting known scalable solutions than coming up with new architecture using first principals.

Hope this was useful. If you enjoyed this post you might also like Mastering the 4 Balancing Acts in Microservices Architecture and Distributed Caching Woes: Cache Invalidation


Sergio Bossa said...

I think there's another kind of "scalable architecture", recently highlighted by the "CQRS" movement and then LMAX: I'll call it "Memory Peers", but others may somewhat refer to it as CQRS itself or Event Sourcing.

It is based on message-based communication between peer components, in-memory storage/processing for performance and low latency with asynchronous durable storage, and peer-replication for fault tolerance and high availability.

May deserve a whole blog post, but that's the idea :)


Sergio B.

stefi said...

thanks mate, great summary!

Srinath Perera said...

Hi Sergio,

Thanks!! Could you point me a link or a paper that describes the concept in bit more detail? If convinced, I could write a follow up post adding that.


Unknown said...

More about the LMAX architecture