BT

Apache Eagle, Originally from eBay, Graduates to top-level project

| by Alexandre Rodrigues Follow 0 Followers on Jan 24, 2017. Estimated reading time: 2 minutes |

Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017.

Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion. In addition to data activity monitoring, Eagle can also provide node anomaly detection, cluster and job performance analytics.

Job performance is analyzed by crunching YARN application logs and by taking snapshots of all running jobs in YARN. Eagle can detect single job trends, data skew problems, failure reasons and assess overall cluster performance taking into account all the jobs running. Eagle calculates task failure ratios for each node to detect the nodes behaving differently from others and requiring attention. For cluster performance, Eagle accounts the resources used by each YARN job and correlates it with transversal services’ metrics (e.g. HDFS namenode’s) to help identify overall cluster slowness causes.

Apache Eagle relies on Apache Storm for stream processing of data activity and operational logs and can perform policy-based detection and alerting. It provides multiple APIs: streaming API as an abstraction on top of Storm API and a policy engine provider API, exposing WSO2's open-source Siddhi CEP engine as first class citizen. Siddhi CEP engine supports hot deployment of alerting rules and alerts can be defined with attribute filtering and window-based rules (e.g. more than three accesses in a 10 minute interval).

A machine learning based policy provider is also included. It learns from past user behaviour to classify a data access to be anomalous or not. The machine learning policy provider evaluates models trained offline in Apache Spark framework. Eagle ships with two machine learning methods to calculate a user profile: a density estimation that computes a Gaussian probability density for each user / activity and a threshold, and a eigen-value decomposition that captures behavioural patterns by reducing the dimensionality of user and activity features.

Data integration is achieved with Apache Kafka via logstash forwarder agent or via log4j kafka appender. Log entries from multiple Hadoop daemons (e.g. namenode, datanode, etc.) are fed into Kafka and consumed by the Storm topology. Eagle supports classification of data assets into multiples sensitivity types.

Apache Eagle User Profiling Architecture

Eagle supports Apache HBase for alert persistence as well as a relational database. Alerts can be notified via e-mail, Kafka or stored in a Eagle supported storage. It is also possible to develop your own alert notification plugin.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT