Thursday, July 14, 2011

Cassandra: Surprises you Might Get

Now do not get me wrong, Cassandra is a great tool, and we use it. Following are few things that you might assume to be the case and only figure out later after you have been using it for some time. So spelling them out in case it is useful for someone else as well. 

  1. No transactions, no JOINs. Hope there is no surprise here. If it does, go and read bit about NoSQL before touching Cassandra. 
  2. No foreign keys and keys are immutable. (well no JOINs, and use surrogate keys if you need to change keys). You can change the keys, but any reference will not change, and there are no foreign key based integrity checks etc. 
  3. Keys has to be unique (use composite keys to work around this one). 
  4. Super Columns and order preserving partitioner are discouraged. - Developers repeats that life will be much easier without them. 
  5. Searching is complicated - No Search coming from the core. Either you have to use secondary indexes or create indexes yourself. Secondary indexes are layered on top, and not part of the main architecture. Also, they do not do range search or pattern search, which mean they good enough for extract string retrievals. So they are not good enough for <, >, <=, >= etc. (only =) and SQL LIKE searches.    When secondary indexes does not work for what you need, you have to learn and build your indexes using sort orders and slices. 
  6. Sort orders are complicated - Column are always sorted by name, but row order depends on the partitioner. If you use sort orders to build your indexes, you have to worry about this. 
  7. Failed Operations may  leave changes - If operation is successful, all is well. However, if it failed, actually changes may have been applied. But operations are idempotent, so you can retry until successful. 
  8. Batch operations are not atomic, but you can retry until successful (as operations are idempotent). 
  9. If a node fails, Cassandra does not figure it out and and do a self haling. Assuming you have replica, things will continue to work. But the whole system recovers only when a manual recovery operation is done. 
  10. It remember deletes - When we delete a data item, a node may be down at the time and may come back after the delete is done. To avoid this, Cassandra mark them as deleted (Tombstones) but does not delete this until configurable timeout or a repair. Space is actually freed up only then. 

1 comment:

Peter said...

Thanks for this list!