Hadoop, the distributive file system and MapReduce are just a few of the topics covered in this interview recorded live at QCon San Francisco 2013. Industry-standard Agile implementation and a lot of testing, assures the development team at Ancestry.com that they have an app that can handle the large traffic demands of the popular genealogy site.
Cliff Click explains 0xdata's H20, a clustering and in-memory math and statistics solution (available for Hadoop and standalone), writing H20's memory representation and compression in Java, low latency Java vs GCs, and much more.
Dean Wampler explains Scalding and the other Hadoop support libraries, the return of SQL, how (big) data is the killer application for functional programming, Java 8 vs Scala, and much more.
Eva Andreasson explains the various Hadoop technologies and how they interact, real-time queries with Impala, the Hadoop ecosystem including Hue, Oozie, YARN, and much more.
Eli Collins discusses Cloudera's CDH4 release, which tasks are well suited for Hadoop, Hadoop and MapReduce vs SQL, the state of Hadoop, and much more.
In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.
Hive co-creator Ashish Thusoo describes the Big Data challenges Facebook faced and presents solutions in 2 areas: Reduction in the data footprint and CPU utilization. Generating 300 to 400 terabytes per day, they store RC files as blocks, but store as columns within a block to get better compression. He also talks about the current Big Data ecosystem and trends for companies going forward.
In this interview Ted Dunning talk about Hadoop, its current usage and its future. He explains the reasons for Hadoop's success and make recommendations on how to start using it.
In this interview recorded at JavaOne 2011 Conference, Spring Hadoop project lead Costin Leau talks about the current state and upcoming features of Spring Data and Spring Hadoop projects. He also talks about the Caching and Data Grid architecture patterns.
Ville Tuulos talks about Disco, the Map/Reduce framework for Python and Erlang, real-world data mining with Python, the advantages of Erlang for distributed and fault tolerant software, and more.
Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.
Adrian Cole discusses his jclouds project, which is an open source library that helps Java developers get started in the cloud and reuse their Java development skills. Cole also talks about some of the challenges of creating a cloud agnostic library, such as the use of different hypervisors and that various cloud implementations are written in different languages, such as VB, Python, Ruby, etc.