Scale-up or Scale-out? Or both?
In the last twenty years the main trend in IT industry was scaling out. It manifested itself in moving from mainframes to networks of Unix and/or Windows servers and culminated with the MapReduce system introduced by Google and picked up by Apache Hadoop. But now there is an interesting discussion on LinkedIn Hadoop user group (group membership needed ) on scaling-up GPUs for "Big Data" analytics using MapReduce and Fat Nodes
The discussion was started by Suleiman Shehu and is a continuation of his 5 months old blog post , where he states that:
Over the last two years large clusters comprising 1,000s of commodity CPUs, running Hadoop MapReduce, have powered the analytical processing of "Big data" involving hundreds of terabytes of information. Now a new generation of CUDA GPUs... have created the potential for a new disruptive technology for "Big data" analytics based on the use of much smaller hybrid CPU-GPU clusters. These small - medium size hybrid CPU-GPU clusters will be available at 1/10 the hardware cost and 1/20 the power consumption costs, and, deliver processing speed-ups of up to 500x or more when compared with regular CPU clusters running Hadoop MapReduce. This disruptive technological advance will enable small business units and organisations to compete with the much larger businesses who can afford to deploy very large Hadoop CPU clusters for "Big data" analytics.
So, considering this potentially significant cost savings and performance gains, the main question in Shehu’s mind is whether:
Given the constraints of the Hadoop MapReduce, can we leverage the parallel processing power of the latest generation of Fermi GPUs, coupled with the simplicity of the MapReduce model, to create much smaller and affordable CPU-GPU clusters that can be used for real-time "Big data" analytics?
In his blog, Shehu asserts that the most appropriate approach for implementing such clusters is scaling up data nodes to Fat Nodes. He suggests that the purpose of Fat Nodes is to keep as much processing as possible on the local node using the following architectural design features:
- Use dual 12 core CPUs each with 64GB or more RAM on each CPU giving 24 CPU cores and 124GB RAM on each node.
- Connect 10 or more... GPUs to the dual CPUs to provide 4,800 GPU processing cores and delivering over 10 TFLOPs of processing power on each node.
- Replace local hard disks with high-speed solid-state drives, each with with 200K IOPs or more per SSD using PCI Express. Multiple SSD can be combined to run in parallel to achieve more than 2.2 million read input/output operations per second (IOPS) on a single node...
- Use where possible 40 Gb/s InfiniBand network connections... for inter-node network traffic... This coupled with a network transfer speed up to 90M MPI messages per second across PCIe 2 bus to another node substantially exceeds the messaging passing capabilities of a large Hadoop clusters.
Based on this, in Shehu’s opinion:
Designing a MapReduce variant that is able to leverage the impressive GPU technology that is available now will significantly lower the upfront cluster build and power consumption costs for "Big data" analytics, whilst at the same time take advantage of the MapReduce model’s ability to lower software development costs.
Although, technically this Map Reduce implementation can increase performance of the overall cluster, the question asked by a discussion participant, Vladimir Rodionov, is about the data capacity of such cluster. One of the advantages of the traditional Hadoop cluster is the ability to store up to petabytes of data, whereas smaller clusters of "Fat Nodes" will require every node to have a lot more disk storage with independent controllers, which will increase a price tag of a cluster.
Another comment by Gerrit Jansen van Vuuren shares the same opinion and goes further, stating that:
... Hadoop was never designed to be used for processor intensive tasks but rather for data intensive tasks - "Big Data". ... no matter how fast your RAM, CPU/GPU etc is, you will always have to read those bytes from a disk, be it SSD or other... Maybe a better software framework for running this computational orientated platform on is something like Grid Gain.
Answering these comments, Shehu notes that:
... there are many Hadoop users who are now using Hadoop to conduct analytical processing on terabytes of data and they also define their data as "Big Data". So the term is fluid. However since Hadoop was never designed to do analytical processing a number of MapReduce variants have been developed such as Twister Iterative MapReduce, Hadoop++ (and others) which are more focused on running analytical M/R queries. This is the initial area that I believe for M/R GPU clusters.
The definition of commodity servers used in Hadoop clusters is changing rapidly. What used to be a high-end server two-three years ago based on the price is now a commodity. So, today’s clusters are gaining more and more compute power. As a result, map-reduce calculations become faster whether we realize this or not. The real question is whether we can have both - huge data storage and execution of computationally intensive tasks (leveraging specialized hardware) in a single cluster without breaking a bank or we really need to start treating the two problems as different and define two different approaches.
BigMemory + Hadoop perhaps
I have instead been working with customers on the notion of using Ehcache + BigMemory in a Hadoop grid to get more density than usual (w/o SSD and w/o disk I/O bottlenecks). We are researching it in detail now over at Terracotta.
Ronny Kohavi Dec 12, 2013