
All things Hadoop
In this interview Ted Dunning talk about Hadoop, its current usage and its future. He explains the reasons for Hadoop's success and make recommendations on how to start using it.

In this interview Ted Dunning talk about Hadoop, its current usage and its future. He explains the reasons for Hadoop's success and make recommendations on how to start using it.
In his new article “MapReduce Patterns, Algorithms, and Use Cases”, Ilya Katsov gives a systematic view of the different MapReduce patterns, algorithms and techniques that can be found on the web or in scientific articles along with several practical use case studies.
After six years of gestation, Big data framework Apache Hadoop 1.0.0 was recently released. Core features in the release include Kerberos Authentication, support for Apache HBase and RESTful API to HDFS. InfoQ spoke with Arun Murthy, VP of Apache Hadoop, about the new release.
![]()
As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.

In this article authors show how leverage Oozie extensibility to implement custom language extensions. This approach can be viewed a specializing workflow language for a given company/line of business.
Ryan King presents how Twitter uses NoSQL technologies - Gizzard, Cassandra, Hadoop, Redis - to deal with increasing data amounts forcing them to scale out beyond what the traditional SQL has to offer.
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
Ville Tuulos talks about Disco, the Map/Reduce framework for Python and Erlang, real-world data mining with Python, the advantages of Erlang for distributed and fault tolerant software, and more.
Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.