InfoQ

InfoQ

Topic/Tag specific view

Hadoop Content on InfoQ


Latest featured content about Hadoop

Grid Gain vs. Hadoop. Why Elephants Can't Fly

Topics
QCon London 2012,
Big Data,
Database Design,
QCon,
Data Analysis,
GridGain,
Database,
Hadoop,
Conferences

Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.

News about Hadoop

Hadoop And Microsoft

Topics
SQL Server,
Relational Databases,
Microsoft,
.NET,
Database,
Companies,
Hadoop,
Programming,
PowerPivot

Want to try out Hadoop with the Microsoft Stack and figure out what capabilities this brings to you? We point to some resources that can help.

VMware Introduces Spring Hadoop

Topics
Spring Data,
Spring,
Dependency Injection,
SpringSource,
Java,
Big Data,
Design Pattern,
Languages,
NoSQL,
Database Design,
VMWare,
Patterns,
Hadoop,
Object Oriented Design,
Design,
Database,
Companies,
Programming,
Spring Hadoop

VMware have announced the availability of Spring Hadoop, which integrates the Spring Framework and the Apache Hadoop platform.

MapReduce Patterns, Algorithms, and Use Cases

Topics
Map-Reduce,
Big Data,
Database Design,
Design Pattern,
Database,
Object Oriented Design,
Patterns,
Hadoop,
Design

In his new article “MapReduce Patterns, Algorithms, and Use Cases”, Ilya Katsov gives a systematic view of the different MapReduce patterns, algorithms and techniques that can be found on the web or in scientific articles along with several practical use case studies.

Articles about Hadoop

Generating Avro Schemas from XML Schemas Using JAXB

Topics
Avro,
Hadoop

Apache Avro is an up and coming binary marshalling framework. In his new article Benjamin Fagin explains how one can leverage existing XSD tooling to create data definitions and then use XJC plugin to directly generate AVRO schemes and marshaling classes.

Exploring Hadoop OutputFormat

Topics
Big Data,
Database Design,
Hadoop,
Database

As more companies adopt Hadoop, its integration with other applications is becoming more important. A key to such integration is usage of the appropriate OutputFormat allowing to produce output data in a form most appropriate for other applications.

Extending Oozie

Topics
Business Process Management,
Java,
Big Data,
Database Design,
SOA,
Enterprise Architecture,
Business,
Languages,
Architecture,
Database,
Programming,
Hadoop

In this article authors show how leverage Oozie extensibility to implement custom language extensions. This approach can be viewed a specializing workflow language for a given company/line of business.

Presentations about Hadoop

Distributed Data Analysis with Hadoop and R

Topics
Strange Loop 2011,
Big Data,
Strange Loop,
Data Analysis,
Database Design,
Conferences,
Hadoop,
Statistics,
Database,
R

Jonathan Seidman and Ramesh Venkataramaiah present how they run R on Hadoop in order to perform distributed analysis on large data sets, including some alternatives to their solution.

Panel: Hadoop for the Enterprise Architect

Topics
Big Data,
QCon San Francisco 2011,
Database Design,
QCon,
Hadoop,
Database,
Conferences

Peter Sirota, Amr Awadallah, Eric Baldeschwieler, Ted Dunning, Guy Bayes, and moderator Ron Bodkin discuss various existing Hadoop use cases, ecosystems, and disaster recovery.

Interviews about Hadoop

Big Data Architecture at LinkedIn

Topics
Neo4j,
MongoDB,
Neo,
Cassandra,
Riak,
Companies,
Graph Database,
Key-Value Store,
BigTable,
Distributed Document Oriented Database,
Big Data,
NoSQL,
Database Design,
Cloud Computing,
Voldemort,
Database,
Hadoop,
Lucene,
Dynamo DB

In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.

Optimizing for Big Data at Facebook

Topics
Clusters,
Big Data Infrastructure,
Clustering & Caching,
Cloud Computing,
Hive,
Performance & Scalability,
Hadoop,
Infrastructure

Hive co-creator Ashish Thusoo describes the Big Data challenges Facebook faced and presents solutions in 2 areas: Reduction in the data footprint and CPU utilization. Generating 300 to 400 terabytes per day, they store RC files as blocks, but store as columns within a block to get better compression. He also talks about the current Big Data ecosystem and trends for companies going forward.