Cloud Foundry: Design and Architecture
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Abel Avram on Oct 26, 2011
Companies rely more and more on big data when making their decisions. Amazon, Cloudera, and IBM have announced their Hadoop-as-a-Service offerings, while Microsoft promises to do the same next year.
Amazon was the first to offer AWS Elastic MapReduce back in 2009, running Apache Hadoop on EC2 and S3. Like many other IaaS offerings coming from Amazon, this service provides the minimum hardware and software necessary to run analytics on big data, leaving a lot to the customer in terms of configuring and programming against the framework, a daunting task requiring lots of expertise. Providing that required skill is available, a company can set up and successfully run Hadoop jobs, as New York Times demonstrated by converting 11 million images, representing public articles published between 1851 to 1922, to 1.5 TB of PDF documents by running a 24 hours Hadoop job on 100 Amazon EC2 instances at a very low price.
Cloudera takes Amazon’s MapReduce service a step further in the right direction offering CDH3, a tuned Hadoop AMI that includes many additional software products helping with administering and running complex jobs on Hadoop, such as: Apache Mahout, Flume, Sqoop, Pig, Oozie, Hive, HBase, ZooKeeper, Whirr, and others, most of them if not all being open source projects. One of the remaining problem remains the sheer amount of expertise and resources needed to install, configure and run this package, the CDH3 Installation Guide (PDF) having no less than 175 pages of guidelines on setting up all sorts of components from the JDK to CDH3, Snappy and all the other parts of the system.
Microsoft has recently announced at PASS Summit 2011 they will provide Hadoop-as-a-service integrated into Windows Azure and SQL Server some time in 2012 for companies interested in crunching large amounts of data on their platform. There are few details available except that Microsoft promised to maintain compatibility with Apache Hadoop codebase and to contribute back to the open source project. They have also made available a Sqoop-based SQL Server-Hadoop Connector which makes possible bidirectional data transfer between SQL tables and Hadoop’s HDFS which is absolutely necessary since Hadoop needs to hold data in its own file system in order to be efficient in processing lots of data.
Another player announced this month is IBM who offers to run Hadoop on their SmartCloud Enterprise using IBM InfoSphere BigInsights software. BigInsights comes in two editions, Basic, which is free and useful for evaluation projects, and Enterprise for production purposes. IBM’s solution seems to be the most mature so far being based on Watson technology, an AI system that beat two of the best Jeopardy! players this year. Watson is not just answering questions by running Hadoop on a large cluster of nodes, but it includes over 100 techniques to “analyze natural language, identify sources, find and generate hypotheses, find and score evidence, and merge and rank hypotheses”, so it is not just a platform to run big data jobs but also provides intelligence on how to address data and interpret it, which is one of the most difficult parts in dealing with it.
Like Cloudera’s solution, IBM’s BigInsights includes beside Hadoop a number of open source programs, such as
BigInsights also includes custom made technology developed by IBM: a text analysis engine, a data exploration tool for business analysts, integration with enterprise software and Hadoop enhancements to make it simpler to administrate and to improve performance.
BigInsights does not replace online analytical processing (OLAP), or online transaction processing (OLTP) applications, but it can be integrated with these in order to “filter through high volumes of raw data and combine the results with structured data stored in your DBMS or warehouse”.
IBM’s Hadoop solution is up and running and can be tested by customers.
Another solution worth mentioning is EMC Greenplum Analytics Workbench, a +1,000 node cluster running Hadoop integration tests, and provided by EMC in partnership with Intel, Mellanox Technologies, Micron, Seagate, SuperMicro, Switch, and VMware. Greenplum is not offering Hadoop-as-a-service but rather providing a platform of over 10,000 virtual nodes and 24 PB of storage to test Hadoop itself.
According to a 2011 TDWI survey, 34% of the companies use big data analytics to help them making decisions. Big data and Hadoop seem to be playing an important role in the future.
Want to know how software releases can be stress-free and happen with one click? Try Go free!
Improving Software Delivery Cycles: Pre-requisites and Inhibitors
Big Data, Cloud & Mobile: Navigate the New Development Reality with Resources from IBM
Go: Agile Release Management Solutions. Go enables predictable, defect-free and timely software releases.
Derek Collison discusses the goals, the design premises and patterns employed in creating the architecture of Cloud Foundry, VMware’s open source PaaS, unveiling internal architectural details.
Andrew Watson talks about the work of the OMG, where CORBA is alive and well (hint: in your car), UML and UML Profiles vs. custom Modeling languages, DDS and other middleware, and much more.
Sohil Shah discusses creating iPhone and Android enterprise mobile applications based on cloud services using the open source platform OpenMobster.
Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.
A common “best practice” for unit tests is to only write a one assertion in each test. I intend to question this advice by showing that multiple assertions per test are both necessary and beneficial.
John Rauser presents the architectural and technological evolution of Amazon retail websites starting with 1994 and ending with adopting Amazon Web Services.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
No comments
Watch Thread Reply