Srinath's Blog :My views of the World: Scaling Distributed Queues: A Short Survey

Following is a part of the related works survey from the paper, "Andes: a highly scalable persistent messaging system". However, following has nothing about Andes, but only how different distributed queue implementations work. I will write about Andes later.

What is a Distributed Queue?

Distributed Queue is a FIFO data structure that is accessed by entities in a distributed environment. Working of a distributed queue will be as follows.

There are two types of users (publishers and subscribers)
A users creates a queue (or queues may be created on demand)
Subscribers subscribe to a queue
Publisher send a message (publish) to the queue
Published message is sent to a one of the subscribers who has subscribed to the queue

Distributed Queues provides strict or best effort support for in-order delivery where subscribers receives messages at the same order they have been published. (It is very hard to enforce this across all subscribers, and therefore, often implementations enforce this within each subscriber. For example, if messages m1, m2 .. m100 are published in order, each subscriber will see a subset of messages in ascending order. But there are no guarantee about the global order seen across subscribers).

What does Scaling Distributed Queues means?

Scaling is handling larger workload by adding more resources. Workload can be increased in many ways, and we call those different dimensions of scale. There are three main dimensions.

Scaling to Handle large number of queues
Scaling to handle a queue that has large workload
Scaling to handle large messages

Distributed Queue Implementations

There are many distributed queue implementations in JMS servers like ActiveMQ, HorentMQ etc. Focus of our discussion is that how can they scale up.

There are four choices

Master-Salve topology – queue is assigned to a master node, and all changes to the queue are also replicated to a salve node. If the master has failed, the slave can take over. (e.g. Qpid and ActiveMQ, RabbitMQ).
Queue Distribution - queues are created and live in a single node, and all nodes know about all the queues in the system. When a node receives a request to a queue that is not available in the current node, it routes the request to the node that has the queue. (e.g. RabbitMQ)
Cluster Connections – Clients may define cluster connections giving a list of broker nodes, and messages are distributed across those nodes based on a defined policy (e.g. Fault Tolerance Policy, Load Balancing Policy). It also supports message redistribution, which means if the broker does not have any subscriptions, the messages received by that broker are rerouted to other brokers that have subscriptions. It is worth nothing that server side (brokers) plays a minor role in this setup.
Broker networks - The brokers are arranged in a topology, and messages are propagated through the topology until the messages reach a subscriber. Usually, this uses Consumer priority mode where brokers that are close to the point of origin are more likely to receive the messages. The challenge is how to load balance those messages. (e.g. ActiveMQ)

Replication in distributed queues is inefficient as delivering messages in-order needs replication of state immediately.

In cluster connections and broker networks, in order message delivery provides a best effort guarantee only. If a subscriber has failed or subscription has been deleted, the broker nodes are force to either drop the message or to redistribute them out of order to the other brokers in the network.

Any of the above modes do not handle scaling for large messages

Summery

Topology	Pros	Cons	Supporting Systems
Master Slave	Support HA	No Scalability	Qpid, ActiveMQ, RabbitMQ
Queue Distribution	Scale to large number of Queues	Does not scale for large number of messages for a queue	RabbitMQ
Cluster Connections	Support HA	Might not support in-order delivery Logic runs in the client side takes local decisions.	HorentMQ
Broker/Queue Networks	Load balancing and distribution	Fair load balancing is hard	ActiveMQ

Srinath's Blog :My views of the World

Wednesday, May 2, 2012

Scaling Distributed Queues: A Short Survey

What is a Distributed Queue?

What does Scaling Distributed Queues means?

Distributed Queue Implementations

Summery

1 comment: