Saturday, April 28, 2012

Introduction to Open Source, Apache and Apache Way

This is the slide deck for introduction to Open Source and Apache Way talk I did at Apache Bar Camp 2012 at Engineering Faculty, University of Peradeniya. More info at

Sunday, April 22, 2012

Generating a Distributed Sequence Number

This is a very common problem in distributed systems (e.g. Message brokers, implementing "At most once deliver", Group communication etc). I was doing some reading for WSO2 Andes project.
There are several options.
  1. Using Zookeeper: Following two threads talk about this. It should be reasonably fast. Twitter guys have tried this and says it was bit slow.
  2. Cassandra:  This has been raised several times, and answer was to use UUIDs (which does not work for us)

    Then Cassandra introduced counters, but it does not support incrementAndGet() and no plan to do the future as well. So that does not work.
  4. Write a custom server: This is easy, basically create a service that give a increasing ID. But very hard to cluster this and behavior in case of a failure is complicated.
  5. "A timestamp, worker number and sequence number": Twitter Guys created solution based on "a timestamp, worker number and sequence number" (this is kind of that we use as well, except that ran few dedicated servers for this)
  6. Other Algos: Only looked at these briefly. But they are complicated.
    Using DHTs:
    A Fault-Tolerant Protocol for Generating Sequence Numbers for Total Ordering Group Communication in Distributed System,
IMHO, "a timestamp, worker number and sequence number" is the best option. Only downside of this is that this assumes that broker nodes are loosely synced in time. Only other option I see is Zookeeper.

Good overview -