BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Ayasdi Partners with Cloudera

by Jérôme Serrano on Jun 06, 2014 |

Ayasdi announced last month a partnership with Cloudera, the biggest distributor of Apache Hadoop. The partnership that will ensure the compatibility of their solution with Cloudera Enterprise 5, the latest version of Cloudera’s big data platform based on Apache Hadoop.

Ayasdi, which means "to seek" in Cherokee, is a data analysis start-up founded in 2008 by three mathematicians to commercialize a novel approach to discover insight from high-dimensional and large data sets. Initially funded by the Defense Advanced Research Project (DARPA) and the National Science Foundation (NSF), this approach is based on a new area of mathematics called Topological Data Analysis (TDA) and allows customers such as General Electric, Merck, the US Food and Drug Administration (FDA) or the Centers for Disease Control and Prevention (CDC) to highlight the geometric structures of their data and build compact summaries that are easier to visualize and easier to explore interactively, without the need to write query or algorithm.

The fundamental idea behind is that data has shape and shape has meaning. In the context of TDA, data are typically represented as a large finite set of points in space, and shapes are used to describe how points are related to each other. For example, a simple aspect of shape "how many pieces do they break into", or "how do they break into cluster" can reveal conceptually different parts of a phenomena. Another aspect "is there any loop" can indicate a recurrent or periodic behavior. Topology is precisely that branch of mathematics that studies this notion of shape and TDA aims to extend that mathematical formalism for defining and measuring qualitative geometric information to large and noisy cloud points.

From a technical perspective, Ayasdi Platform leverages CDH 5's Hadoop Distributed File System (HDFS) for managing large amounts of customer data and uses HBase for storing some of its operational metadata that is randomly accessed and frequently updated. According to Lawrence Spracklen, Chief Architect at Ayasdi, the company also leverages Spark for various ETL activities and is partnering with Intel to help drive performance, robustness and minimal overhead security but it uses a custom non-hadoop stack to compute and distribute topological networks.

Asked about the size of the current largest Cloudera cluster used by Ayasdi's customers, Lawrence explains us that scale is only one part of the story.

The Ayasdi Data Platform is horizontally scalable, and readily scales to many tens of nodes. However, the size of the Cloudera cluster is far less interesting than the overall size of the data set coupled with the complexity of the data. As a frame of reference, advancements in data analysis technology haven’t kept up with this fast paced data environment of change and growth. The lack of development around data analysis technology stems from the inherent challenges posed by the computations and techniques needed in order to analyze extremely complex, high dimensional data. Ayasdi’s solution was built to find insights in extremely complex data. Examples include data sets of medical disorders and disease e.g. cancer, PTSD, and diabetes. These data sets are not always big but tend to be extremely complex.

YouTube hosts videos (one, two) that explore TDA in more detail. The International Conference on Machine Learning (ICML) offers a PDF to the same end, as has a Topological Data Analysis and Machine Learning Theory workshop held in 2012. For a more concrete example, the Institute for Computational and Mathematical Engineering at Stanford University (ICME) presents a computational method for extracting simple descriptions of high dimensional data.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT