Hive co-creator Ashish Thusoo describes the Big Data challenges Facebook faced and presents solutions in 2 areas: Reduction in the data footprint and CPU utilization. Generating 300 to 400 terabytes per day, they store RC files as blocks, but store as columns within a block to get better compression. He also talks about the current Big Data ecosystem and trends for companies going forward.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.
Gil Tene talks to Charles Humble about different garbage collection techniques, and specific collectors including Azul's C4, IBM's Balanced GC, and Oracle's Garbage First, before moving on to discuss both the JCP and OpenJDK.
Martin Thompson and David Farley discuss how to use the scientific method to create high performance systems by measuring performance and adapting the implementation to approach the limits of current hardware. The disruptor architecture is an open sourced result of their work at low-latency, high throughput systems for the retail trading platform of LMAX Ltd.
In this interview recorded at JavaOne 2011 Conference, Spring Hadoop project lead Costin Leau talks about the current state and upcoming features of Spring Data and Spring Hadoop projects. He also talks about the Caching and Data Grid architecture patterns.
Steve Vinoski and Bob Ippolito discuss web development with MochiWeb and Yaws and extending Erlang with native code. Also: async I/O in Python and Node.js vs Erlang.
Jonas Bonér explains the Akka project and the types of actors it offers as well as its transactional features. Also: a preview of how Akka 2.0 changes the management of (remote) actors.
Orion Henry explains what make Heroku's PaaS tick, in particular the new extensible Cedar stack as well as Doozer, the implementation of the Paxos algorithm created at Heroku.
Terracotta creator Ari Zilka talks about about the RAM is the new disk and argues for scaling up before scaling out, comparing the architectural approaches of lots of VMs with small heaps vs. a few JVMs with very large heaps. Ari introduces BigMemory, a Java add-on to Enterprise Ehcache, which allows app designs with huge amounts of memory accessible in-process, with minimal garbage collection.
Aaron Patterson talks about performance in Ruby and Rails, some of the challenges Rails and Rack pose for the Ruby GC, and much more.
Justin Sheehy and Damien Katz discuss Riak and CouchDB, the strengths and trade-offs of different approaches to NoSQL, and why both databases are written in Erlang.