BT

HPCC Systems Launches Big Data Delivery Engine on EC2

by Jean-Jacques Dubray on Dec 01, 2011 |

HPCC (High Performance Computing Cluster) is an open source massively parallel-processing computing platform that solves Big Data problems. This week, they announced that they made their Data Delivery Engine available on EC2. It's architecture is composed of a Data Refinery Cluster (Thor) and a Query Cluster (ROXIE):

The HPCC system architecture incorporates the Thor and Roxie clusters as well as common middleware components, an external communications layer, client interfaces which provide both end-user services and system management tools, and auxiliary components to support monitoring and to facilitate loading and storing of filesystem data from external sources.

Thor is responsible for consuming vast amounts of data, transforming, linking and indexing that data. It functions as a distributed file system with parallel processing power spread across the nodes. A cluster can scale from a single node to thousands of nodes. Roxie (the Query Cluster) provides separate high-performance online query processing and data warehouse capabilities.

Both Thor and Roxie are based on a parallel processing programming language (ECL- Enterprise Control Language) optimized for extraction, transformation, loading, sorting, indexing and linking. It is "implicitly parallel", non-procedural and dataflow oriented. It combines data representation and algorithm implementation and can easily be extended with C++ libraries.

HPCC also provides an ESP, (Enterprise Services Platform) which exposes XML, HTTP, SOAP and REST interfaces to ECL queries. The access model is based on SAML.

HPCC sees several key differentiators with Hadoop:

  • the Enterprise Control Language, which is implicitlly parallel, declarative and contains high level primitives such as JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP...
  • HPCC is integrated and support multiple data types (including fixed and variable length delimited records and XML)
  • the data model is defined by the user and is not contrained by the limitation of a strick key-value paradigm
  • in Hadoop MapReduce, nearly all complex data transformation requires a series of MapReduce cycles executed in series, which is not the case in HPCC

Big Data solutions continue to evolve at a rapid pace fuelled by their advances, scalability and ultimately the desire to process and query very large amounts of data. Even NPR is talking about Big Data ! Did you participate in a big data project? What's is your take on it? Where do you see Big Data going from here?

Hello stranger!

You need to Register an InfoQ account or to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT