Hadoop, the distributive file system and MapReduce are just a few of the topics covered in this interview recorded live at QCon San Francisco 2013. Industry-standard Agile implementation and a lot of testing, assures the development team at Ancestry.com that they have an app that can handle the large traffic demands of the popular genealogy site.
Cliff Click explains 0xdata's H20, a clustering and in-memory math and statistics solution (available for Hadoop and standalone), writing H20's memory representation and compression in Java, low latency Java vs GCs, and much more.
Martin Thompson discusses the buidling of complex systems with regards to the Reactive Manifesto. Many web-based systems are built in a synchronous manner and that way of development may be their greatest barrier to scale and could greatly limit their production lifespan. Martin discusses these shortcomings and gives some advice on how to make systems truly reactive.
Martin Thompson discusses how an understanding of the hardware is central to the creation of high-performance software even when using platform independent languages like Java.
Martijn Verburg discusses his new start-up jClarity, which offers performance tooling for the Cloud. He also provides an update on the Adopt a JSR and Adopt OpenJDK programs.
Bob Lee explains the popularity of Java, future language features like Lambdas, DI with Guice vs. the Dagger framework, the role of Java vs. Ruby at Square, hiring at Square, security and much more.
Serkan Piantino explains how Facebook has managed to scale up, what types of errors occur in an architecture that size and how to handle them, RAM vs disk, and much more.
Hive co-creator Ashish Thusoo describes the Big Data challenges Facebook faced and presents solutions in 2 areas: Reduction in the data footprint and CPU utilization. Generating 300 to 400 terabytes per day, they store RC files as blocks, but store as columns within a block to get better compression. He also talks about the current Big Data ecosystem and trends for companies going forward.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.
Gil Tene talks to Charles Humble about different garbage collection techniques, and specific collectors including Azul's C4, IBM's Balanced GC, and Oracle's Garbage First, before moving on to discuss both the JCP and OpenJDK.
Martin Thompson and David Farley discuss how to use the scientific method to create high performance systems by measuring performance and adapting the implementation to approach the limits of current hardware. The disruptor architecture is an open sourced result of their work at low-latency, high throughput systems for the retail trading platform of LMAX Ltd.