Hive co-creator Ashish Thusoo describes the Big Data challenges Facebook faced and presents solutions in 2 areas: Reduction in the data footprint and CPU utilization. Generating 300 to 400 terabytes per day, they store RC files as blocks, but store as columns within a block to get better compression. He also talks about the current Big Data ecosystem and trends for companies going forward.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
InfoQ catches up with Manik Surtani to discuss JSR 347, data grids and Inifinispan. Manik dicusses overlap with NoSQL and support for Memcached and HotRod wire protocol as well.
In this interview recorded at JavaOne 2011 Conference, Spring Hadoop project lead Costin Leau talks about the current state and upcoming features of Spring Data and Spring Hadoop projects. He also talks about the Caching and Data Grid architecture patterns.
Terracotta creator Ari Zilka talks about about the RAM is the new disk and argues for scaling up before scaling out, comparing the architectural approaches of lots of VMs with small heaps vs. a few JVMs with very large heaps. Ari introduces BigMemory, a Java add-on to Enterprise Ehcache, which allows app designs with huge amounts of memory accessible in-process, with minimal garbage collection.
Billy Newport talks to InfoQ about the need for higher level abstraction to do parallel programming with multi-core systems effectively. The interview explores some approaches taken with MapReduce products such as Cascading and Pig for a Hadoop cluster, explores the limitations of the actor model and message passing, and touches on IBM's WebSphere eXtreme Scale (ObjectGrid) product.
Ari Zilka, co-founder and CTO of Terracotta, talks about the capabilities of Terracotta, the use cases it supports, and the rationale and impact of taking Terracotta to an open source model.