Saturday, April 28, 2012

Sunday, April 22, 2012

Generating a Distributed Sequence Number

This is a very common problem in distributed systems (e.g. Message brokers, implementing "At most once deliver", Group communication etc). I was doing some reading for WSO2 Andes project.
There are several options.
  1. Using Zookeeper: Following two threads are talking about this. It should be reasonably fast. Twitter guys have tried this says it was bit slow.
    http://zookeeper-user.578899.n2.nabble.com/Sequence-Number-Generation-With-Zookeeper-td5378618.html
    http://www.mail-archive.com/zookeeper-user@hadoop.apache.org/msg01976.html
  2. Cassandra:  This has been raised several times, the nd answer was to use UUIDs (which does not work for us)
    http://comments.gmane.org/gmane.comp.db.cassandra.user/3304
    http://stackoverflow.com/questions/3935915/how-to-create-auto-increment-ids-in-cassandra.

    Then Cassandra introduced counters, but it does not support incrementAndGet() and no plan to do the future as well. So that does not work.
  3. http://www.datastax.com/dev/blog/whats-new-in-cassandra-0-8-part-2-counters
  4. Write a custom server: This is easy, basically create a service that give a increasing ID. But very hard to cluster this and behavior in case of a failure is complicated.
  5. "A timestamp, worker number and sequence number": Twitter Guys created solution based on "a timestamp, worker number and sequence number" (this is kind of that we use as well, except that ran few dedicated servers for this) http://engineering.twitter.com/2010/06/announcing-snowflake.html
  6. Other Algos: Only looked at these briefly. But they are complicated.
    Using DHTs: http://horicky.blogspot.com/2007/11/distributed-uuid-generation.html
    A Fault-Tolerant Protocol for Generating Sequence Numbers for Total Ordering Group Communication in Distributed System, http://www.iis.sinica.edu.tw/page/jise/2006/200609_16.pdf
IMHO, "a timestamp, worker number and sequence number" is the best option. Only downside of this is that this assumes that broker nodes are loosely synced in time. Only other option I see is Zookeeper.

Good overview - http://stackoverflow.com/questions/2671858/distributed-sequence-number-generation/5685869.

Hope this was useful. If you enjoyed this post you might also like Mastering the 4 Balancing Acts in Microservices Architecture and Distributed Caching Woes: Cache Invalidation