Tuesday, March 17, 2015

Introducing WSO2 Analytics Platform: Note for Architects

WSO2 have had several analytics products:WSO2 BAM and WSO2 CEP for some time (or Big Data products if you prefer the term).  We are adding WSO2 Machine Learner, a product to create, evaluate, and deploy predictive models, very soon to that mix. This post describes how all those fit within to a single story. 

Following Picture summarises what you can do with the platform. 



Lets look at each stages depicted the picture in detail. 

Stage 1: Collecting Data

There are two things for you to do.

Define Streams - Just like you create tables before you put data into a database, first you define streams before sending events. Streams are description of how your data look like (Schema). You will use the same Streams to write queries at the second stage. You do this via CEP or BAM's admin console (https://host:9443/carbon) or via Sensor API described in the next step.


Publish Event - Now you can publish events. We provide a one Sensor API to publish events for both batch and realtime pipelines. Sensor API available as Java clients (Thrift, JMS, Kafka), java script clients* ( Web Socket and REST) and 100s of connectors via WSO2 ESB. See How to Publish Your own Events (Data) to WSO2 Analytics Platform (BAM, CEP)  for details on how to write your own data publisher. 

Stage 2: Analyse Data

Now time to analyse the data. There are two ways to do this: analytics and predictive analytics. 

Write Queries

For both batch and realtime processing you can write SQL like queries. For batch queries, we support HIVE SQL and for realtime queries we support Siddhi Event Query Language

Example 1: Realtime Query (e.g. Calculate Average Temperature over 1 minute sliding window from the Temperature Stream) 

from TemperatureStream#window.time(1 min)
  select roomNo, avg(temp) as avgTemp
  insert into HotRoomsStream ;

Example 2: Batch Query (e.g. Calculate Average Temperature per each hour from the Temperature Stream)

insert overwrite table TemperatureHistory
  select hour, average(t) as avgT, buildingId
  from TemperatureStream group by buildingId, getHour(ts);


Build Machine Learning (Predictive Analytics) Models

Predictive analytics let us learn “logic” from examples where such logic is complex. For example, we can build “a model” to find fraudulent transactions. To that end, we can use machine learning algorithms to train the model with historical data about Fraudulent and non-fraudulent transactions.


WSO2 Analytics platform supports predictive analytics in multiple forms
  1. Use WSO2 Machine Learner ( 2015 Q2) Wizard to build Machine Learning models, and we can use them with your Business Logic. For example, WSO2 CEP, BAM and ESB would support running those models.
  2. R is a widely used language for statistical computing, and we can build model using R, export them as PMML ( a XML description of Machine Learning Models), and use the model within WSO2 CEP. Also you can directly call R Scripts from CEP queries
  3. WSO2 CEP also includes several streaming Regression and Anomaly Detection Operators

Stage 3: Communicate the Results

OK now we have some results, and we communicate those results to users or systems that cares for these results. That communications can be done in three forms.
  1. Alerts detects special conditions and cover the last mile to notify the users ( e.g. Email, SMS, and Push notifications to a Mobile App, Pager, Trigger physical Alarm ). This can be easily done with CEP.
  2. Visualising data via Dashboards provide the “Overall idea” in a glance (e.g. car dashboard). They supports customising and creating user's own dashboards. Also when there is a special condition, they draw the user's attention to the condition and enable him to drill down and find details. Upcoming WSO2 BAM and CEP 2015 Q2 releases will have a Wizard to start from your data and build custom visualisation with the support for drill downs as well.
  3. APIs expose Data as to users external to the organisational boundary, which are often used by mobile phones. WSO2 API Manager is one of the leading API solutions, and you can use it to expose your data as APIs. In the later releases, we are planning to add support to expose data as APIs via a Wizard.

Why choose WSO2 Analytics Platform?

Reason 1: One Platform for both Realtime, Batch, and Combined Processing - with Single API for publish events, and with support to implement combined usecases like following
  1. Run the similar query in batch pipeline and realtime pipeline ( a.k.a Lambda Architecture)
  2. Train a Machine Learning model (e.g. Fraud Detection Model) in the batch pipeline, and use it in the realtime pipeline (usecases: Fraud Detections, Segmentation, Predict next value, Predict Churn)
  3. Detect conditions in the realtime pipeline, but switch to detail analysis using the data stored in the batch pipeline (e.g. Fraud, giving deals to customers in a e-commerce site)
Reason 2: Performance - WSO2 CEP can process 100K+ events per second and one of the fastest realtime processing engines around. WSO2 CEP was a Finalist for DEBS Grand Challenge 2014 where it processed 0.8 Million events per second with 4 nodes.

Reason 3: Scalable Realtime Pipeline with support for running SQL like CEP Queries Running on top of Storm. - Users can provide queries using SQL like Siddhi Event Query Language. SQL like query language provides higher level operators to build complex realtime queries. See SQL-like Query Language for Real-time Streaming Analytics for more details. 
For batch processing, we use Apache Spark ( 2015 Q2 release forward), and for realtime processing, users can run those queries in one of the two modes.
  1. Run those queries using a two CEP nodes, one nodes as the HA backup for the other. Since WSO2 CEP can process in excess of hundred thousand events per second, this choice is sufficient for many usecases.
  2. Partition the queries and streams, build a Apache Storm topology running CEP nodes as Storm Sprouts, and run it on top of Apache Storm. Please see the slide deck Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts. This enable users to do complex queries as supported by Complex Event Processing, but still scale the computations for large data streams. 
Reason 4: Support for Predictive analytics support building Machine learning models, comparing them and selecting the best model, and using them within real life distributed deployments.


Almost forgot, all these are opensource under Apache Licence. Most design decisions are discussed publicly at architecture@wso2.org.


Refer to following talk at wso2con Europe for more details. ( slides).



If you find this interesting, please try it out. Please reach out to me or through http://wso2.com/contact/ if you want to know more information.




No comments: