Hadoop, the distributive file system and MapReduce are just a few of the topics covered in this interview recorded live at QCon San Francisco 2013. Industry-standard Agile implementation and a lot of testing, assures the development team at Ancestry.com that they have an app that can handle the large traffic demands of the popular genealogy site.
Cliff Click explains 0xdata's H20, a clustering and in-memory math and statistics solution (available for Hadoop and standalone), writing H20's memory representation and compression in Java, low latency Java vs GCs, and much more.
Xavier Amatriain discusses how Netflix uses specialized roles, including that of the Data Scientist and Machine Learning Engineer, to deliver valuable data at the right time to Netflix' customer base through a mixture of offline, online, and nearline data processes. Xavier also discusses what it takes to become a Machine Learning Engineer and how to gain real experience in the field.
Eva Andreasson explains the various Hadoop technologies and how they interact, real-time queries with Impala, the Hadoop ecosystem including Hue, Oozie, YARN, and much more.
Etsy's approach to big data has been to give the entire organization visibility to different sources of data generated by their product as well as access to the experts who know how to use it. Nell Thomas explains her role at Etsy and how Etsy's view of big data has shaped its product's evolution.
Big Data means more than just the size of a dataset. Pavlo Baron explains different ways of applying Big Data concepts in various situations: from analytics, to delivering content, to medical applications. His larger vision for Big Data ranges from specialized Data Scientists, to learning Decision Support Systems, to helping mankind itself.
Erik Meijer explains the various aspects needed to categorise data stores, how reactive programming fits in with databases, the return to data transformation, denotational semantics, and much more.
Eli Collins discusses Cloudera's CDH4 release, which tasks are well suited for Hadoop, Hadoop and MapReduce vs SQL, the state of Hadoop, and much more.
Stuart Halloway explains Datomic, programming transactional behavior with Datomic, Datalog and logic programming, programming with values, Clojure Reducers and much more.
Max Sklar talks about machine learning at Foursquare, the use of Bayesian Statistics and other methods to build Foursquare's recommendation system and much more.
In this interview at QCon London, LinkedIn’s Sid Anand discusses the problems they face when serving high-traffic, high-volume data. Sid explains how they’re moving some use cases from Oracle to gain headroom, and lifts the hood on their open source search and data replication projects, including Kafka, Voldemort, Espresso and Databus.
Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.