Sunday, January 23, 2011

NoSQL and Search

One notable change that comes with most NoSQL solutions is lack of ad-hoc search support. For example, with Cassandra, you have to either store the data considering the queries you want to be doing (that is store based on your Application ) or do Map-Reduce.

For most End user Apps, this is acceptable, and often Apps (at least the simple ones) do few queries, and often users can consider those and store data accordingly in the the storage. However, the same is not true for middleware frameworks like Registries which do not know what users going to store and what they will search for.

There are few possible solutions. Note that most NoSQL databases partition the data across several nodes, and the challenge is doing ad-hoc queries across the nodes.

  1. Map Reduce - which is the common solution advocated, but typically results take some time to be calculated. This is great for batch processing and data warehouse type applications. 
  2. Scatter Gather - idea is to send request to all nodes and collect the data. MangoDB uses this and they claim most queries can be answered by only talking to one node. Downside is that it is expensive when it has to talk to all the nodes. 
  3. Word Indexing  (e.g. Lucene) - For document like usecases this is ideal. However, this lack the context of data within the document (can only look for document or record that has given words). So this is not ideal for key value pairs and column family like DB
  4. Binary Trees - idea is to build binary three for each property (field or Column) stored in the storage, and this is how most Relational databases are implemented. This support not only support exactly matching searches but also supports ranges based searches as well. Overhead of doing this is not very clear yet. 
As you would see neither is perfect (may be except for the last). Above were few thoughts on the subject, and in my opinion, this is a problem NoSQL databases have to solve. 

No comments: