Sunday, April 3, 2011

Getting Started with MPI with your laptop

Yes, to do something serious with MPI you need a cluster. But to try it out, you can do that in your laptop. Main setup should only take 10-15 minutes. Following is how to do it.


1. Download open MPI, http://www.open-mpi.org/software/ompi/v1.4/
2. Build by issuing ./configure and make
3. Write the mpi program and compile it with mpic++ -o hello hello.c
4. Run it with mpirun -np 10 hello. Here "hello" is the program and "-np" is the number of processes.

Note that to run N process program, you can run it in a single CPU. MPI will just run N OS processes. 

Well learning MPI might take some thinking. This and this might help. One thing to remember is that MPI model is bit different. If process 0 is broadcasting and all other processes are receiving the message, every process call MPI_Broadcast and call behave different on the root than in other processes. Good luck. 





Sunday, January 23, 2011

NoSQL and Search

One notable change that comes with most NoSQL solutions is lack of ad-hoc search support. For example, with Cassandra, you have to either store the data considering the queries you want to be doing (that is store based on your Application ) or do Map-Reduce.

For most End user Apps, this is acceptable, and often Apps (at least the simple ones) do few queries, and often users can consider those and store data accordingly in the the storage. However, the same is not true for middleware frameworks like Registries which do not know what users going to store and what they will search for.

There are few possible solutions. Note that most NoSQL databases partition the data across several nodes, and the challenge is doing ad-hoc queries across the nodes.

  1. Map Reduce - which is the common solution advocated, but typically results take some time to be calculated. This is great for batch processing and data warehouse type applications. 
  2. Scatter Gather - idea is to send request to all nodes and collect the data. MangoDB uses this and they claim most queries can be answered by only talking to one node. Downside is that it is expensive when it has to talk to all the nodes. 
  3. Word Indexing  (e.g. Lucene) - For document like usecases this is ideal. However, this lack the context of data within the document (can only look for document or record that has given words). So this is not ideal for key value pairs and column family like DB
  4. Binary Trees - idea is to build binary three for each property (field or Column) stored in the storage, and this is how most Relational databases are implemented. This support not only support exactly matching searches but also supports ranges based searches as well. Overhead of doing this is not very clear yet. 
As you would see neither is perfect (may be except for the last). Above were few thoughts on the subject, and in my opinion, this is a problem NoSQL databases have to solve. 

Monday, January 17, 2011

Cloud Computing: An Introduction




I did this public talk at FITIS tech eve (Federation of the Information Technology Industry in Sri Lanka). It is a high level overview of the cloud, designed to answer questions like What? Why? When? You can download slides from Cloud Computing: An Introduction

Monday, January 3, 2011

Congratulations Dr. Jaliya Ekanayake!

http://sanjiva.weerawarana.org/2011/01/congratulations-dr-jaliya-ekanayake.html :)

Wednesday, December 29, 2010

Carbon: Towards a Server Building Framework for SOA Platform

Following is the slides I presented at the 5th International Workshop on Middleware for Service Oriented Computing (co-located with Middleware 2010, India). The paper can be found in http://portal.acm.org/citation.cfm?id=1890914 and here.

The paper presents Carbon, the underline platform for all WSO2 products. Based on OSGI, Carbon provides a server building framework for SOA with a kernel, which handles most of the details regarding building SOA servers. The paper discusses the design decisions, potential impact, and its relationship to state of art in the Component Based Software Engineering.

BISSA: Empowering Web gadget Communication with Tuple Spaces

Following is a talk I did at the 8th International Workshop on Middleware for Grids, Clouds and e-Science (co located with Super Computing 2010 at New Orleans). The research paper with the same title can be found at http://portal.acm.org/citation.cfm?id=1890809.

The paper presents Biassa, a scalable tuple space implementation on top of a Distributed Hash Table, and its application as an inter gadget communication medium. This is the outcome of a University of Moratuwa final year project done by Pradeep Fernando, Charith Wickramarachchi, Dulanjanie Sumanasena, and Udayanga Wickramasinghe and supervised by Dr. Sanjiva Weerawarana  and myself. I am elated that MRT final year projects can go this far!

An Scaling pattern: Life beyond Distributed Transactions: an Apostate’s Opinion

The paper “Life beyond Distributed Transactions: an Apostate’s Opinion” provides an inserting a scaling pattern to cope without transactions for very large scale.

The idea can be described using three things: entities, activities, and using workflows instead of transactions.
  • entity is a single collection of data that lives within a single scope of serializability (e.g. cluster). Each entity has a unique key, and transactions cannot expand multiple entities.
  • Activity is a relationship between two entities (holds all state about such a relationship).
  • Finally, the key idea is that entities cope by not doing transactions with each other, but handling the uncertainty rise out of lack of transactions through a workflow. 
  • The author makes an interesting observation that normal world does not have transactions, but cope with uncertainties through time limits, compensations, cancellations etc., or in other words cope through a workflow. So he argues that activities should be carried out with tentative message, confirmation or cancellation model.
So does that scale? Yes, handling stabilizability within one entity is very much possible, and rest is data partition, which is well understood. Only key decider is that can we model uncertainties that arise from lack of transactions through workflows. My gut feeling is most of the time we can, but there certainly are exceptions.