InfoQ Homepage Distributed Systems Content on InfoQ

Articles

RSS Feed

Newer Older

Spoilt for Choice – How to choose the right Big Data / Hadoop Platform?

In his new article Kai Wähner compares several alternatives for installing a version of Hadoop and realizing big data processes. He compares distributions and tooling from Apache and many other vendors including Cloudera, HortonWorks, MapR, Amazon, IBM, Oracle, Microsoft. He additionally describes pros and cons of every distribution and provides a decision tree for choosing a most appropriate one.

Kai Wähner
on Jul 09, 2013
Interview and Video Review: Working with Big Data: Infrastructure, Algorithms, and Visualizations

Paul Dix leads a practical exploration into Big Data in this video training series. The first five lessons of the training span multiple server systems with a focus on the end to end processing of large quantities of XML data from real Stack Exchange posts. He completes the training with a lesson on developing visualizations for gaining insights from the macro level analysis of Big Data.

Aslan Brooke
on May 02, 2013
Hadoop Virtual Panel

In this virtual panel, InfoQ talks to several Hadoop vendors and users about their views at current and future state of Hadoop and the things that are the most important for Hadoop’s further adoption and success.

Boris Lublinsky
on Nov 20, 2012
Interview with Arun Murthy on Apache YARN

Apache Hadoop YARN – a new Hadoop resource manager - has just been promoted to a high level Hadoop subproject. InfoQ had the chance to discuss YARN with Arun Murthy - founder and architect at Hortonworks.

Boris Lublinsky
on Aug 17, 2012
Generating Avro Schemas from XML Schemas Using JAXB

Apache Avro is an up and coming binary marshalling framework. In his new article Benjamin Fagin explains how one can leverage existing XSD tooling to create data definitions and then use XJC plugin to directly generate AVRO schemes and marshaling classes.

Benjamin Fagin
on Mar 06, 2012
Exploring Hadoop OutputFormat

As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.

Jim.Blomo
on Dec 07, 2011
Extending Oozie

In this article authors show how leverage Oozie extensibility to implement custom language extensions. This approach can be viewed a specializing workflow language for a given company/line of business.

Boris Lublinsky Mike Segel
on Aug 02, 2011
An Open, Interoperable Cloud

This article describes how interoperable clouds can be created, today, through the integration of open standards such as the Open Cloud Compute Interface, the Open Virtualisation Format and CDMI. They provide the means to package virtual infrastructure deployments, an API for the runtime management of storage infrastructure and an API for the runtime management of infrastructure as service.

Andy Edmonds Thijs Metsch Eugene Luster
on Jul 19, 2011
Oozie by Example

End to end Oozie example, including process design, resource coordinator and workflow implementation

Boris Lublinsky Mike Segel
on Jul 18, 2011
Introduction to Oozie

Basic introduction to Oozie - a framework allowing to combine multiple Map/Reduce jobs into a logical unit of work.

Boris Lublinsky Michael Segel
on Jul 12, 2011
Using Apache Avro

Boris Lublinsky presents an introduction to AVRO and evaluate its usage for Schema componentization, inheritance and polymorphism. He also discusses backward compatibility issues and AVRO solutions for this problem.

Boris Lublinsky
on Jan 25, 2011
Data Mining in the Swamp: Taming Unruly Data With Cloud Computing

Matrix presents a white paper on using the open source tool, Hadoop, to implement the MapReduce strategy and a Cloud computing strategy to solve business intelligence problems.

John Brothers
on Aug 13, 2010