Thursday, March 27, 2014

Implementing Bigdata Lambda Architecture using WSO2 CEP and BAM

Most real world Bigdata use cases involve both stream processing (real-time) processing as well as batch processing. To address both the concerns Nathan Marz introduced an architecture style called “Bigdata Lambda architecture”.

Following figure shows the outline of Lambda Architecture that includes batch, speed, and serving layers. Incoming data are sent to both batch and speed layers, where batch layer pre-calculates a historical view of the system and speed layer calculates the most recent view of the system. The sensing layer combines the two layers to satisfy the given queries.

You can find more information about Lambda Architecture from following.
  1. Big Data Lambda Architecture
  2. The Lambda architecture: principles for architecting realtime Big Data systems
  3. Applying the Big Data Lambda Architecture 
Following picture shows how we can implement the lambda architecture using WSO2 Products.

As the picture depicts, you can use WSO2 BAM to implement the batch layer and WSO2 CEP to implement speed layer. We send incoming data to both BAM and CEP using high performance data transport called "Data bridge" that can achieve throughput upto 300,000 events/second. BAM run user defined Hive queries to calculate the batch views and CEP runs user define CEP queries to calculate the runtime views. Then we can combine the both the views using “Event tables” in WSO2 CEP, which maps the batch views in a database into CEP windows, to answer the queries posed by the users.

For example, the next figure shows how to implement the following query using lambda architecture. You can find more information in my Strata talk.

“If velocity of the ball after a kick is different from season average by 3 times of season’s standard deviation, trigger event with player ID and speed”

Here we combine CEP and BAM to answer the query.

Wednesday, March 26, 2014

WSO2Con Talk: Accelerating Mobile Development with Mobile Enterprise Application Platforms (MEAP)

Following are the slides for my wso2con talk about upcoming MEAP product. This talk describes WSO2 MEAP, a product that let users develop and manage the complete lifecycle of mobile application development. MEAP includes support for both Mobile App development and back end service development as well.

You can download the slides from here.

Saturday, March 22, 2014

Tools and Techniques to make sense of Bigdata

Couple of weeks back, I did a talk titled "Big Data Analysis:Deciphering the haystack" , which is about different tools available for bigdata analysis.

 Tools I categorised based on following taxonomy based on what they do. Note there are tools for streaming (a.k.a. realtime) analytics as well as for store and processing.

Also I categorised Analysis techniques, or in other words making sense of data, based on what they are used to achieve into three sub topics based on the goals.
  1. To know what happend - this is basic analytics 
  2. To explain what happend - this is detecting patterns. e.g. data mining. 
  3. To forecast what will happen - this is forecasting models e.g. regression, numerical modes (e.g. weather models that use simulation) and other machine learning algorithms etc. 

 Following is the slidedeck I used, and please check it out for more information.


Slides for the talk Internet of Things and Big Data

Couple of weeks back, I did a talk about Internet of Things and Big Data at Export Development Board auditorium. Following are the slides for the talk. You can find a writeup about the seminar from and recording is available from youtube.


Strata 2014 Talk:Tracking a Soccer Game with Big Data

In January I did a talk at Oreilly Strata SF 2014 about how we solved the DEBS grand challenge, which involved processing data collected from a sensors in the ball and players boots in a football Game. I had blogged about some of the details before.  Following is the slide deck I used for the talk.


Also following is the abstract.

 Mobile devices, sensors and GPS are driving the demand to handle big data in both batch and real time. This presentation discusses how we used complex event processing (CEP) and MapReduce based technologies to track and process data from a soccer match as part of the annual DEBS event processing challenge. In 2013, the challenge included a data set generated by a real soccer match in which sensors were placed in the soccer ball and players’ shoes. This session will review how we used CEP to implement DESB challenge and achieved throughput in excess of 100,000 events/sec. It also will examine how we extended the solution to conduct batch processing using business activity monitoring (BAM) using the same framework, enabling users to obtain both instant analytics as well as more detailed batch processing based results.

View, Act, and React: Shaping Business Activity with Analytics, BigData Queries, and Complex Event Processing

Following is the slides for my talk at WSO2Con San Francisco 2013. It talks about how to use WSO2 BAM and WSO2 CEP to build big data solutions that handles both realtime processing as well as batch processing.

View, Act, and React: Shaping Business Activity with Analytics, BigData Queries, and Complex Event Processing from Srinath Perera

Following is the abstract.

Sun Tzu said “if you know your enemies and know yourself, you can win a hundred battles without a single loss.” Those words have never been truer than in our time. We are faced with an avalanche of data. Many believe the ability to process and gain insights from a vast array of available data will be the primary competitive advantage for organizations in the years to come.

To make sense of data, you will have to face many challenges: how to collect, how to store, how to process, and how to react fast. Although you can build these systems from bottom up, it is a significant problem. There are many technologies, both open source and proprietary, that you can put together to build your analytics solution, which will likely save you effort and provide a better solution.

In this session, Srinath will discuss WSO2’s middleware offering in BigData and explain how you can put them together to build a solution that will make sense of your data. The session will cover technologies like thrift for collecting data, Cassandra for storing data, Hadoop for analyzing data in batch mode, and Complex event processing for analyzing data real time.

Where the mind is without fear

Following is a poem by Rabindranath Tagore, which to me in the same class as "Invictus", "Man in the arena", "IF" or "Desiderata (Desired Things)". (If you have not read those, check them out also). It is amazing how close does he it into ideals of "free society" and ideas like opensource. I love the part "Where the clear stream of reason has not lost its way Into the dreary desert sand of dead habit".

Where the mind is without fear and the head is held high
Where knowledge is free
Where the world has not been broken up into fragments by narrow domestic walls
Where words come out from the depth of truth
Where tireless striving stretches its arms towards perfection
Where the clear stream of reason has not lost its way Into the dreary desert sand of dead habit
Where the mind is led forward by thee into ever-widening thought and action
Into that heaven of freedom, my Father, let my country awake.