Thursday, March 27, 2014

Implementing Bigdata Lambda Architecture using WSO2 CEP and BAM

Most real world Bigdata use cases involve both stream processing (real-time) processing as well as batch processing. To address both the concerns Nathan Marz introduced an architecture style called “Bigdata Lambda architecture”.

Following figure shows the outline of Lambda Architecture that includes batch, speed, and serving layers. Incoming data are sent to both batch and speed layers, where batch layer pre-calculates a historical view of the system and speed layer calculates the most recent view of the system. The sensing layer combines the two layers to satisfy the given queries.

You can find more information about Lambda Architecture from following.
  1. Big Data Lambda Architecture
  2. The Lambda architecture: principles for architecting realtime Big Data systems
  3. Applying the Big Data Lambda Architecture 
Following picture shows how we can implement the lambda architecture using WSO2 Products.

As the picture depicts, you can use WSO2 BAM to implement the batch layer and WSO2 CEP to implement speed layer. We send incoming data to both BAM and CEP using high performance data transport called "Data bridge" that can achieve throughput upto 300,000 events/second. BAM run user defined Hive queries to calculate the batch views and CEP runs user define CEP queries to calculate the runtime views. Then we can combine the both the views using “Event tables” in WSO2 CEP, which maps the batch views in a database into CEP windows, to answer the queries posed by the users.

For example, the next figure shows how to implement the following query using lambda architecture. You can find more information in my Strata talk.

“If velocity of the ball after a kick is different from season average by 3 times of season’s standard deviation, trigger event with player ID and speed”

Here we combine CEP and BAM to answer the query.

Post a Comment