BT

Elephant in the Cloud - Hadoop as a Service

| by Srini Penchikala Follow 36 Followers on May 02, 2016. Estimated reading time: 1 minute |

 

Ashish Thusoo, CEO and co-founder at Qubole, recently spoke at Enterprise Data World Conference (EDW) about "The Elephant in the Cloud", Hadoop as a Service offering. Part of a wider trend of big data as a service category rather than a product category, Hadoop as a service offerings are intended to help organizations deal with the challenges and costs associated with running Hadoop services at scale. These cloud-based services also benefit from other properties of the cloud, such as dynamic provisioning, elasticity of compute and storage, and availability in multiple geographies.

Ashish started the discussion by saying that the nature of data now includes high volumes of interaction data that's typically unstructured in nature than just the structured transactional data we have been processing in our applications for a long time.

The nature of analytics has also changed. Ashish talked about the "Analytic Value Escalator" that shows the transition from descriptive to prescriptive analytics.

  • Descriptive Analytics (What happened?)
  • Diagnostic Analytics (Why did it happen?)
  • Predictive Analytics (What will happen?)
  • Prescriptive analytics (How can we make it happen?)

The cloud provides benefits like on-demand and elastic infrastructure, highly scalable object stores and processing, and adaptable infrastructure. Using big data platforms on the cloud disrupts the on-premise model by providing faster time to production, agility and flexibility of infrastructure, and significant cost reduction.

Virtual Private Cloud (VPC) helps isolate access to compute and storage as well as offer security best practices. Security in VPCs includes the options for encryption both for data at rest and data over the network as well as role based access for compute and storage.

A modern data platform needs multiple engines as listed below, that can address the diverse data processing use cases in a typical organization:

  • Hive for complex batch SQL
  • Spark for data science
  • Presto for interactive simple SQL
  • Map Reduce for batch ETL

Ashish also discussed the reference architecture for big data on the cloud. This model includes services like multi-user data access, engine unification, and cloud orchestration & portability service. He concluded the presentation saying that Hadoop as a Service offering is a compelling option to look at while deciding on the big data infrastructure.

 

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Wow by Eric Worthy

Nice commercial for Qdoba. Why would anyone want to put all of their data in the cloud and be charged a huge sum to get it out. Yuk.

Re: Wow by Shamus Ack

People who already have their data in the cloud, and people who don't want to invest in high-cost infrastructure.

"charged a huge sum to get it out."

Can you elaborate here?

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT