Standards and Open Source for Cloud Computing
Three recent announcements highlight the evolving cloud ecosystem in favor of openness and standards.
- Red Hat has moved its Deltacloud effort to the Apache Incubator. According to David Lutterkort:
The main reason for this move is that we’ve heard from several people that they really liked the idea of Deltacloud, and the concept of a true open source cloud API, but didn’t like it as a Red Hat project. Apache Incubator is a well-established place for parties with different interests to collaborate on a specific code base, so it seemed the logical place to address these concerns.
- Rackspace announced its OpenStack project"
On July 19, 2010, we announced that we are opening the code on our cloud infrastructure. ... The initial components being offered through this project include the code that powers our Cloud Files (available today) and Cloud Servers (expected available late 2010).
- The Distributed Management Task Force (DMTF) has released two documents - "Architecture for Managing Clouds" and "Use Cases and Interactions for Managing Clouds that are intended to lay the groundwork for DMTF's next step; naming an API working group to draft APIS for "infrastructure as a service."
OpenStack and Apache Deltacloud have similar goals - building lightweight REST APIs that allow cloud provider access via an HTTP network. OpenStack is more focused on public cloud service providers and Deltacloud is more focused on private clouds.
The DMTF work is more basic. First, they are trying to establish a common vocabulary of cloud computing terms. Then it hpes to write a set of public APIs that cloud vendors can use to supply standard cloud services. Ultimately the DMTF would name a working group that would draft APIs for infrastructure as a service - specifically an interface for each stage of cloud operations; e.g. submitting an external workload, loading the virtual machine, starting the VM, storing results, and termination.
One possible snag in the DMTF effort is the lack of participation by Amazon and its EC2. Winston Bumpus, president of the DMTF and VMwares standards director believes that the DMTF effort should move forward regardless. "If the APIs are wll drafted and widely followed, the pressure will build on Amazon to support it."
The need for standards and the desirability of open source projects for cloud computing infrastructure and management seems to be needed, and needed quickly. A June 2010 survey by Information Week Analytics shows that 40% of surveyed companies are already using cloud services and another 20% plan to use them within the next 24 months. Charles Babcock, at techweb.com suggests:
We need all these efforts to evolve the cloud, so it connects to many different customers and implements varied styles of computing.
Will your company find value in standards or be able to take advantage of the open source projects focused on cloud computing as an infrastructure service?
Machine image portability is not the problem
Developers expect the cloud applications to be portable. This needs standardization of middleware (eg: stateless architecture) and datastore (eg: NoSQL - BigTable, Azure storage, Force.com DB) at the PaaS layer.
Re: Machine image portability is not the problem
One project that seems to try to address this issue is Apache Nuvem, which recently joined Apache and is currently undergoing the Incubation process at the ASF and it has the main goal to : Define an open API that abstracts common cloud platform services to help decouple the application logic from the particulars of a specific proprietary cloud AND Implement the Nuvem API for popular clouds such as Google AppEngine, Amazon EC2 and Microsoft Azure. If people are interested in this area, feel free to join us at Apache Nuvem mailing lists.
Re: Machine image portability is not the problem
GridGain is a distributed cloud computing middleware that combines that combines compute and data grid technology with unique auto-scaling capabilities on any managed infrastructure - from a single laptop to a large hybrid cloud consisting of thousands of nodes.
Using GridGain you can quickly build distributed applications that work natively in the cloud environment: scale up or down based on the demand, cache distributed data for high availability, and speed up long running tasks using MapReduce.
GridGain = Compute + Data + Cloud
I like vmforce, especially from the point of view that it enables java developers to leverage force.com marketplace. But this "portability" pitch of vmforce without substantiation is simply overrated.
vmforce is based on spring and runs on force.com DB. This is a proprietary database of salesforce.com. Once your application is written using persistence and query language specific to force.com DB, is difficult to port it, even though it is written in java!
Thats why i was talking about standardization of datastore (eg: NoSQL - BigTable, Azure storage, Force.com DB) access, earlier.
But the JPA was spec-ed for relational databases (unlike JDO). Trying to retrofit these NoSQL databases into the same standards, is not a good idea, IMO. Apart from the join & query limitations, in ACID properties:
- Atomicity is not guaranteed across partitions for these databases (both BigTable & Force.com DB that DataNucleus supports). It has to be built around idempotency, instead of 2-phase commits.
- Transaction isolation. JPA assumes pessimistic concurrency control, instead of optimistic. The workaround that we do around session.flush() either screws up read consistency or atomicity. It is also not clear how columns that are stored as rows in Force.com DB, work for row locks in READ_COMMITTED transaction isolation.
So i re-iterate: standardization of datastore access, please!