Grid Gain vs. Hadoop. Why Elephants Can't Fly
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Andy Edmonds, Thijs Metsch, Eugene Luster on Jul 19, 2011
In this article we describe how through leveraging mature open standards (related to Clouds, Grids and Storage), interoperable clouds can be created now. We demonstrate the use of the major contemporary innovative cloud interface specifications to realise an open standards-based, interoperable, cloud offering.
The motivation for this article is to demonstrate that currently available features in cloud standards from various standard-developing organisations (SDOs) are enough to create a cloud offering as described in the following sections. This article can be seen as one of the first steps of an exploration into cloud standards integration, especially between Open Cloud Compute Interface (OCCI) [1], Open Virtualisation Format (OVF) [4] and CDMI [3]. Last but not least what is detailed in this document can be used to drive collaborations between OCCI and the Cloud Management Working Group (CMWG) [10] as well as efforts like cloud-standards.org, Standards and Interoperability for eInfrastructure Implementation Initiative (SIENA) [5], National Institute for Standards and Technology (NIST) [6] and others.
In order to describe how this integration of standards can happen, we will use a simple scenario. That scenario is one where a startup service provider wants to deploy, scale, migrate and redeploy their new Hadoop’s [7] based MapReduce service.
In order to implement/execute the scenario, the following enabling standards are used:
It should be noted that aspects such as authorisation and authentication are not dealt with in this article. These issues are somewhat orthogonal and dealt with by other specifications and technologies (e.g. OAuth, OpenID, etc). OCCI and CDMI leverage the design considerations in the HTTP protocol suite with respect security.
The scenario is one where a startup company wishes to offer a MapReduce service to their clients. As the service offering is new it will be offered to a limited set of beta users. This limitation allows the startup company's architects to create an initial deployment architecture that suits the initial resource requirements to deliver the service. The service's deployment architecture is as follows:

Once the service has been deployed, the number of users makes it necessary that the service scales. This means additional resources must be added on the fly to serve the demand. Next to the scaling, migrations are considered part of the overall scenario. In case that the infrastructure provider suffers a major outage the startup is forced to migrate the service. This forms the final piece of the scenario in which the service provider moves the complete deployment over to a new more suitable infrastructure provider.
The scenario consists of two distinct phases:
To each of these phases there are a number of steps, which will be detailed in relation to the standards used to execute each step.
The overall goal of the startup is to utilise the infrastructure provider’s CDMI and OCCI interfaces and to supply the provider with OVF service descriptions, which it can understand. It is these provider-offered capabilities that enable interoperability for the startup.
To create the initial deployment as shown in the figure above and demonstrate scaling, OCCI and CDMI will be used. This phase consists of the following:
The figure below shows the deployment of the startup’s service in the context of these management APIs.

At this point the aim is to create the initial service deployment, based on the architecture, so that it can be offered out to beta users. Overall, the process of setting up the service offering consists of a set of steps which reflects the setup as describe in the previous diagram:
When the service is deployed, numerous RESTful resources will be present (such as here) and will be manageable through their respective API. Also the overall service could be represented by a RESTful resource which could be an OCCI representation (with Actions, Links etc.) of the overall service.
Over time the deployed service might need to horizontally scale (positively; adding new nodes and negatively; removing nodes). In the context of the scenario, the reason for the scaling is either that the current set of beta-users find huge value in the new service and so are placing more workloads onto the service or that as the service matures the provider increases the number of beta-users. This means that OCCI-managed virtual machines with Hadoop’s Task Trackers and Data Nodes must be added on-demand. Hadoop’s Name Node and Job tracker reside on the master node and might not need scaling at this stage.
There is only one step in this phase, namely, the operation of adding a new compute instance to the service and so horizontally scaling the service. This can be easily achieved by the following calls to the OCCI compatible interface:
It is assumed that the Hadoop slave node has been configured a priori in such a manner that it registers with the Hadoop master node.
In the scenario having had poor service delivery from their current provider, the startup decides to migrate their current deployment to a new provider. In this case all three standards need to work together to ensure migration and redeployment from cloud A to B can be achieved. The overall process takes 4 steps:
If the migration is a ‘cold’ migration (service is passivised and offline during migration), OCCI’s features to stop and restart the virtual machines, network resources, etc. can be used. Currently, to support a live migration where there is little or no downtime of the service, the relevant capabilities (e.g. live migration across sub-nets) would need to be supported (akin to peering agreements/facilities) by the providers involved in live migration rather than the specifications.
The scenario described earlier shows that there are some interactions needed between several (currently available) cloud standards. The scenario demonstrates the need to both scale and migrate. In both cases three standards play an important role. OCCI is suited as the runtime management API that triggers the operations. OVF is suited for portability reasons and CDMI is very appropriate for data movement and format migration. Although CDMI could also be used for some management tasks (e.g. Importing and Uploading Images through CDMI) - further extension to the current specification would be required.
To be able to ensure that the scenario can be realized the standards need to be integrated. The next sections focus on how the standards could interact, and also the changes that might be needed or nice to have for some of the standards.
One of the biggest challenges currently found is the migration of data. This not only means the transport of raw data from cloud A to B but also the possible conversion of data formats. For example VMware’s file format, which is supported by cloud service provider A, might need to be converted to a VirtualBox format for cloud service provider B.
The data conversion and migration issue is currently not well documented and probably deserves further investigation. The migration process between two clouds might be triggered by OCCI but actually also involve CDMI and OVF. CDMI could to take care of the data conversions, however this is more so an aspect of service implementation. Descriptions in the OVF representation might change during such an operation.
OCCI presents a simple representation of storage management capabilities. This allows for the most basic storage-related operations to be carried out. One of the goals of OCCI is integration of existing standards and not to reinvent the wheel. In the case where a provider wishes to expose a richer management interface to storage then it is recommended by OCCI to use SNIA’s CDMI. This recommendation is accounted for in the specification, which details how CDMI managed storage can be represented in the OCCI infrastructure model (See section 3.4.3 of the OCCI Infrastructure Model Specification [8])
Since both specifications already leverage on each other integration of both can already be achieved at a high-level. The following sections provide the needed information for the integration and lead to ideas for a more tight integration of the two standards.
As defined in the OCCI’s Infrastructure extension, OCCI can be used in conjunction with the SNIA cloud storage standard, CDMI, to provide enhanced management of cloud computing storage and data. In order to integrate the two, OCCI’s StorageLink should be used. This will link OCCI managed Resources to CDMI resources.
If a service provider implements both the OCCI and CDMI interface the users will begin a process to initiate and execute the migration. If the Service provides does not provide a OCCI and CDMI interface migration cannot happen without direct user interaction. Discovery of whether a provider supports OCCI and CDMI can be done using the /.well-known/ interface.
This process of migration is one that will follow the steps as set out in the above section "Service Migration and Redeployment". Another cloud provider (which also exposes the OCCI and CDMI interface) would be queried for the necessary capabilities in order to satisfy that the required service capabilities are present. Upon success and satisfaction of capabilities, the data needs to be migrated between the clouds and then the necessary resources provisioned.
CDMI can be used to address the issue of migration of the data. A new data object can be created at the destination cloud provider. Upon creation of the new data object the source should be a the original data object. See section 15 of the CDMI specification [9].
The following topics are suggested to further integrate OCCI and OVF into CDMI.
When importing an OVF file through a CDMI interface it should be possible to assign network and compute resources, which are defined in the OVF, however this is currently not possible in the current OVF specification. CDMI focuses on the storage requirements of clouds but if an OVF document is to be imported, it could not only take care of the storage resource assignments but also interact with an OCCI interface and satisfy the complete OVF document’s resource needs.
The current process of using OCCI and CDMI is described in section 13 of the CDMI specification. Currently, resource instances of the OCCI Infrastructure model can link to the CDMI model. It is OCCI’s StorageLink that is used to bind compute resources to their storage[1]. Therefore this current link has a direction pointing from the OCCI model instances towards CDMI containers.
Adding to this integration, it would be useful to allow links to be directed from CDMI towards resource instances of the OCCI models (a complementary reverse linkage). Semantically this means to link storage resources in CDMI back to their associated OCCI Compute and/or storage resource. In general CDMI users could than see which services are associated to their data objects. In this scenario it would allow discovery of which disks are used for the virtual machines and also which data is used for the MapReduce service itself.
As described in RFC 5785 we would recommend exposing the CDMI capabilities interface, which is currently exposed through the path ‘cdmi_capablities’, (also) under the path ‘/.well-known/com/snia/cdmi’. OCCI shares this method and could expose it’s query interface through the path ‘/.well-known/org/ogf/occi’. That way a client has a unified way of accessing and querying the capabilities of a service offering. Eventually these namespaces should be registered with IANA.
Next to the aspect of storage and data in the cloud, portability of Services must be ensured. DMTF’s OVF specification presents a way of describing complete Services in a portable way. The OCCI interface could be used to import and export Service definitions in an OVF format. The following sections will elaborate on these ideas.
OCCI and OVF could simply coexist in parallel with each other. The only addition currently needed would be support of a MIME type, which tells the OCCI service provider that the Client wants to retrieve or supply information in an OVF format. The specification of this new MIME type is not included in the OCCI specification. It only supports text/occi, text/plain and text/uri-list at the moment.
The following tables give an overview of how the OCCI attributes can be mapped to the OVF attributes and vice-versa.
|
Description |
OVF |
OCCI |
Details |
|
Architecture of the CPU (86, x64) |
<vssd:VirtualSystemType>[...String...]</vssd:VirtualSystemType> |
occi.compute.architecture |
Described in a VirtualHardwareSection |
|
# of cores |
<rasd:ResourceType>3</rasd:ResourceType> <rasd:VirtualQuantity>1</rasd:VirtualQuantity> |
occi.compute.cores |
Described in a VirtualHardwareSection |
|
Hostname |
<Property ovf:key="hostname" ovf:type="string"> </Property> |
occi.compute.hostname |
Part of ProductSection |
|
Speed of the CPU |
<rasd:ResourceType>3</rasd:ResourceType> <rasd:AllocationUnits>hertz * 10^6</rasd:AllocationUnits> <rasd:Reservation>500</rasd:Reservation> |
occi.compute.speed |
Described in a VirtualHardwareSection |
|
Amount of memory |
<rasd:ResourceType>4</rasd:ResourceType> <rasd:VirtualQuantity>512</rasd:VirtualQuantity> |
occi.compute.memory |
Described in a VirtualHardwareSection |
|
Status of the resource |
N/A |
occi.compute.state |
|
Description |
OVF |
OCCI |
Details |
|
A network label |
<Property ovf:key="label" ovf:type="string"> |
occi.network.label |
Defined via Properties in the ProductSection |
|
A vlan name |
<Property ovf:key="vlan" ovf:type="string"> |
occi.network.vlan |
Defined via Properties in the ProductSection |
|
Status of the resource |
N/A |
occi.network.state |
|
Description |
OVF |
OCCI |
Details |
|
Size of the storage device. |
<Disk ovf:diskId="vmdisk2" ovf:capacity="536870912" |
occi.storage.size |
Described in DiskSection |
|
Status of the resource |
N/A |
occi.storage.state |
Other OCCI resources might also need a mapping, such as some Links and Mixins, which are defined by the OCCI infrastructure model extension. Those mappings are straightforward and in-line with the previously described tables.
The service provider currently decides what happens with unmapped attributes. It could happen that retrieving an OCCI managed service in an OVF format results in a minimalistic OVF file, which only holds the attributes also present in the OCCI infrastructure representation. The service provider however could choose to hold the OVF representation in parallel to the service deployment in the OCCI infrastructure model in which case the complete OVF representation could be retrieved. We currently would encourage the use of the latter.
Horizontal and Vertical scaling are currently not covered in detail by the OVF specification, although ranges can be defined in the service description. The use of ranges allow a client to specify a valid set of ranges for an attribute however (Like the number of virtual machines), these may not be enough for scaling, given that scaling should happen based on some logic. The horizontal and vertical scaling of the service can only be achieved by using OCCI in conjunction with OVF.
The OCCI infrastructure extension describes ways for the user to define resource and OS templates. Templates allow clients of an OCCI implementation to quickly and conveniently apply predefined configurations to OCCI Infrastructure defined types. They are implemented using OCCI’s Mixin instances. The OS and Resource template build upon each other:
It would be desirable to have these templates being described by an OVF representation as well.
The following two topics are not directly related to OCCI as a boundary protocol. Rather they refer to how service providers implementing OCCI should handle some operations.
The OVF format allows the configuration of some operations such as describing what should happen upon power-on and power-off operations. Future versions of OVF might even go into details by describing ActivationEngines and related aspects. OVF therefore allows some level of detail that describes the semantics of how a Service should be handled. Service providers, which allow the provisioning of a complete service using OVF over OCCI, need to take care of these configurations/details and configure the Resource Management Framework accordingly.
Closely related to the previous topic are conformance levels which are be described by OVF. Service providers supporting service provisioning using OVF over OCCI need to take care that these levels are met. The conformance levels must be met when instantiating the service. If the levels are not met the client could see this as a violation of a Service-level Agreement (SLA). So therefore if a Service provider cannot implement the conformance level requested in the OVF description the request for provisioning should be denied.
The following topics should be considered to enable a further integration of OVF and OCCI.
Network features can already be described in OVF, although it would be desirable to be able to describe complete network setups like the one used in the Scenario section in the service description. It is noted that OVF 2.x will be more supportive for these descriptions.
The OCCI client relies on correct mime-types while requesting operations or receiving information from a service. The MIME-type information is used by the service provider (content negotiation) to understand what sort of information it has received. The client can always request in which mime-type they want to receive the requested information. For an importing and exporting feature for OVF files clients would be posting OVF files to an OCCI service or request the current state in an OVF format. For best interoperability we would recommended that a MIME-type for OVF be registered.
While investigating the scenario used in this article it was found that some topics would be interesting for further investigation:
This article documented observations and action items of the OCCI working group’s perspective resulting from the collaborative efforts of attending SDO sessions at the DMTF Alliance Partner Technical Symposium (APTS). DMTF hosted this face-to-face event to investigate the possibility of integrating open cloud computing interoperability and portability standards efforts of OVF, CDMI and OCCI.
The scenario described in the first sections was used to step through different processes during the service’s lifetime, in particular migration and scaling of the service. It is intended to reflect a real-world service offering, albeit basic, and demonstrates that by using today’s standards it is possible to realise such a setup.
Further sections in the article discussed how the triplet of standards can be integrated and where open issues reside. The open issues requiring further investigation or improvement were shown and described. The overall integration is sufficient to achieve the complete deployment, migration and scaling of the service described in the scenario section using today available versions of those specifications.
It is noted here that bringing in a fourth currently available specification might be desirable. CSA’s CloudAudit has become accepted by the industry and adds important features to the scenario in this document. Auditing clouds and conformance levels was left out of scope in this document but would be useful to investigate further.
Although currently focusing more on the infrastructure level of the service deployment a more PaaS based scenario (like a Node.js or GAE service) using cloud standards could be described in future revisions of this document.
The authors would like to thank Mark Carlson (Oracle) and David Slik (NetApp) for inputs on the integration of CDMI and OCCI.
We would like to thank Winston Bumpus (VMware) and Jeff Wheeler (Huawei) for their help on OVF-related topics.
The following reviewers work was greatly appreciated while creating, editing and reviewing this document:
Andy Edmonds is a researcher at Intel.
Thijs Metsch is Senior Software Engineer, at Platform Computing, focusing on Grid and Cloud Technology.
Eugene Luster is a Cloud Advocate at R2AD, focusing on Cloud Computing open standards development.
[1] The semantic meaning here is to bind a disk image to a Virtual Machine instance.
These effort in making cloud open and inter-operable is great but these standards work for "IaaS". Making similar effort for SaaS would probably be more worthwhile.
setandbma.wordpress.com/2011/07/15/forrester-sa...
Hey Udayan,
That's the original intent of the three discussed standards, however you will find that those standards are being used today all the way up the "traditional" *aaS stack. For example, all three have been used in exploratory PaaS work. SaaS would of course be valuable however the challenge posed there is the very domain specific nature of SaaS.
Andy
The use of an interoperability scenario to show how the standards can work together and where the gaps remain is a really instructive format--it is challenging to see how standards work together without this kind of format... so thanks for a nice article. Has anything changed/improved with the standards relevant to this integration scenario since this was published? What is the best way to track progress on the interoperability of the standards (which is different than just following changes to the standards)? Thanks.
Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.
Jesper Richter-Reichhelm presents the DevOps integration at Wooga, and how their system architecture has evolved over the years in order to cope with the increasing number of players.
"Swarming" is a technique whereby many members of a team work together to deliver a User Story, taking advantage of the skills of many team members together. How do you do this in a distributed team?
Ken Sipe introduces Glu, an open source deployment automation tool coming from LinkedIn, showing how to perform Glu configuration along with a demo of using it.
Jesper Boeg discusses why it is important to deliver software early, why it is difficult to do so, along with tools/tips/practices: shared vision, story maps, coaching, and others.
Mark McGranaghan presents how Heroku has designed, developed and operated cloud services providing high availability for their PaaS.
James Pearce discusses the status of HTML5, what it can do today and what it still missing across major mobile browsers.
LinkedIn’s Sid Anand discusses problems when serving high-traffic, high-volume data, how they’re moving some use cases from Oracle to gain headroom, and Kafka, Voldemort, Espresso and Databus.
3 comments
Watch Thread Reply