Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Guides InfoQ eMag: Hadoop

InfoQ eMag: Hadoop


Hadoop 2 - which provides a huge update over Hadoop 1 - is no longer just about Map-Reduce. This eMag will delve into the various updates in Hadoop 2, including new projects such as Storm, Tez, Spark and others. Through various case studies, it will examine Hadoop architecures, some useful frameworks, and look at how teams leverage Hadoop for real-world projects.

Free download

Contents of the Hadoop eMag include:

  • Introduction - Apache Hadoop is an open-source framework that runs applications on large clustered hardware (servers). It is designed to scale from a single server to thousands of machines, with a very high degree of fault tolerance.
  • Building Applications With Hadoop - When building applications using Hadoop, it is common to have input data from various sources coming in various formats. In his presentation, “New Tools for Building Applications on Apache Hadoop”, Eli Collins overviews how to build better products with Hadoop and various tools that can help, such as Apache Avro, Apache Crunch, Cloudera ML and the Cloudera Development Kit.
  • What is Apache Tez? - Apache Tez is a new distributed execution framework that is targeted to-wards data-processing applications on Hadoop. But what exactly is it? How does it work? In the presentation, “Apache Tez: Accelerating Hadoop Query Processing”, Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share initial results obtained by making Hive use Tez instead of MapReduce.
  • Modern Healthcare Architectures Built with Hadoop - We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.
  • How LinkedIn Uses Apache Samza - Apache Samza is a stream processor LinkedIn recently open-sourced. In his presentation, Samza: Real-time Stream Processing at LinkedIn, Chris Riccomini discusses Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.

About InfoQ eMags

InfoQ eMags are professionally designed, downloadable collections of popular InfoQ content - articles, interviews, presentations, and research - covering the latest software development technologies, trends, and topics.