InfoQ

News

GigaSpaces XAP 6.0: Simplified, Spring-based API for Space-Based Architecture

Posted by Ryan Slobojan on Sep 25, 2007 01:00 AM

Community
.NET,
Java
Topics
Clustering & Caching,
Performance & Scalability,
Grid Computing,
Enterprise Architecture
Tags
Microsoft,
Commercial Product Releases,
Terracotta,
Spring,
GridGain,
Coherence,
EC2,
Grid Computing,
GigaSpaces

GigaSpaces recently released version 6.0 of it's eXtreme Application Platform (XAP), which is an infrastructure software platform that provides scaling out of applications in distributed environments. InfoQ spoke with Geva Perry and Nati Shalom of GigaSpaces to learn more about this release and the changes that have occurred in this version.

First, Perry and Shalom were asked to describe the major changes in 6.0:

  • OpenSpaces - this is the primary development platform for 6.0, and uses the Spring Framework to provide a POJO-driven development model as well as components for highly scalable, event-driven and service-oriented architectures. It provides components such as an in-memory data grid, remoting, declarative event containers and transactions, and an OSGi-like deployment model
  • Persistence as a Service (PaaS) - also known as the Mirror Service, it provides reliable asynchronous persistence of the contents of an in-memory data grid with a back-end database. This is done without modifying application code or configuration, the Mirror Service handles everything transparently
  • JMS 1.1 interoperability - this enables sending feeds directly through the JMS API which are then immediately transformed into entries in the space. This greatly reduces latency and simplifies development and deployment of event-driven applications due to the reduced number of components
  • SLA-driven Container - The Service Grid, which dynamically manages instances of the cluster through Service-Level Agreements (SLAs), is now greatly simplified by using Spring, and is integrated into all editions of the product (including the fre Community edition)
  • Enhanced .Net Support - Performance has been increased, .Net is supported natively, and a new set of APIs provide seamless .Net and Java interoperability by allowing both .Net and Java to run in embedded mode. GigaSpaces has also partnered with Microsoft to provide more packaged solutions for Microsoft technologies like Excel, SharePoint, Visual Studio, and Windows Compute Cluster Services
  • Amazon Elastic Compute Cloud (EC2) support - 6.0 is now available for use on the EC2 service at a cost of $0.10 per hour per server so that users can experiment with it and seeing how it scales to multiple servers
  • Integrated Space-Based Architecture (SBA) support - In previous releases implementing SBA required some development and configuration effort - in 6.0 this is simplified, with SBA components like Processing Units made an explicit part of the API

When asked to describe further what Processing Units were, Perry and Shalom said:

In Space-Based Architecture, a Processing Unit represents the unit of scale and fail-over of an application. It normally contains all of the application services and middleware components that have a tight latency/runtime dependency. It encapsulates those services under a single container (Processing Unit) and maintains consistent scaling and fail-over semantics in a generic fashion to all of those components. A failure event, for example, will automatically trigger a recovery process of both the middleware components (messaging, data grid) and the business logic associated with it. In this way we avoid partial failure or inconsistent behavior resulting from the fact that a failure event happened, messaging system started to deliver events but the application service is not yet ready to process them. From a latency perspective, the encapsulation of all those components in the same run-time container reduces network overhead, because they interact purely in memory. Scalability becomes as simple as adding more processing units. In other words, there is no need to separately scale the data, business logic and/or messaging tiers.

They also described where they saw this release fitting into the bigger picture:

6.0 is yet another important step forward in fulfilling the vision we have been promoting for a while. At the core of this vision is the realization that the days of building applications using n-tier architectures and the J2EE stack are coming to an end! These architectures and the middleware technologies that rely on them have hit a wall in terms of their ability to support the scaling, reliability and performance required of today's business applications.

Among other things, the newly emerging architectures will support horizontal scalability based on low-cost hardware as opposed to vertical scalability; they will leverage in-memory data grids as the real-time, on-line transactional system of record instead of the RDBMS; they will enable the dynamic co-location of data and services to create self-sufficient "Processing Units" with a single "always-available" fault-tolerance clustering.

Perry and Shalom also commented that other major sites such as eBay, Google, MySpace and Amazon have come up with similar ideas, and pointed at MapReduce, Hadoop and memcached as examples. However, they made it clear that they are careful to maintain interoperability with the J2EE world and with existing developer skills through support for JDBC, JMS and Spring. They also indicated that products like XAP Community Edition and OpenSpaces are meant to target mainstream developers more than their enterprise offerings, which are primarily used by large companies.

They were also asked how Gigaspaces compared to GridGain, Coherence and Terracotta, and they first stated that all of these vendors are trying to educate the community about better ways of building and deploying applications. However, Gigaspaces is designed to be a comprehensive application platform that addresses all aspects of scaling, performance and high availability - they said that other vendors focus more on specific aspects of distributed computing such as distributed caching. They also mentioned that GigaSpaces was not an batch application-focused Enterprise Grid computing solution - to provide this, GigaSpaces partners with companies such as GridGain, DataSynapse and Platform Computing. They also focus more on the intra-application grid for running a single application in a distributed environment, rather than managing multiple applications on shared resources in an inter-application grid.

Perry and Shalom were also asked to comment on Cameron Purdy's recent thoughts on the future of grid computing:

Cameron's claim has always been that Tangosol is the best Data Caching technology because it has focused on doing just that: Data Caching. We believe it is now self-evident that solving the scalability, performance and availability of an entire application, end-to-end, not just the data bottleneck of that application, will require a lot more than just distributed caching functionality (even if it is called a Data Grid). Oracle have realized it, that's why they bought Tangosol and are attempting to add caching to their Fusion Middleware stack that will also include their legacy application server and messaging technologies.

However, no matter how good the integration of different technologies is (and it is yet to be seen that it will be a good one), it will always be fundamentally handicapped by the inherent lack of a common dynamic clustering model for scalability, performance and continuous-availability for all of the participating components. It is simply the wrong approach. Compare that to the holistic SBA approach that's available with GigaSpaces 6.0 XAP.

Finally, they were asked what the future held for XAP:

In addition to [EC2] and our Community Edition, we are also going to make a special Start-Up Offering, which allows start-ups, open source projects and non-profits to use the product in production for free. This is the first place we have mentioned this upcoming offering publicly. It will launch in the coming weeks, so stay tuned.

11 comments

Reply

GigaSpaces XAP 6.0: Simplified, Spring-based API for Space-Based ... by Julian Browne Posted Sep 25, 2007 8:42 AM
Re: GigaSpaces XAP 6.0: Simplified, Spring-based API for Space-Based ... by Nati Shalom Posted Sep 26, 2007 10:19 AM
Oracle Coherence Data Grid by Cameron Purdy Posted Sep 25, 2007 3:57 PM
Xtreme Transaction Processing by Nati Shalom Posted Sep 25, 2007 7:31 PM
Re: Xtreme Transaction Processing by Cameron Purdy Posted Sep 27, 2007 10:23 AM
Re: Xtreme Transaction Processing by Nati Shalom Posted Sep 27, 2007 6:18 PM
Compute grid vs. Data grid by John Davies Posted Sep 27, 2007 10:51 AM
Re: Compute grid vs. Data grid by Nati Shalom Posted Sep 27, 2007 6:42 PM
Two very different approaches by Nikita Ivanov Posted Sep 25, 2007 7:15 PM
Re: Two very different approaches by Nati Shalom Posted Sep 25, 2007 7:56 PM
Interview with GigaSpaces by Jesse Chan Posted Sep 26, 2007 5:25 PM
  1. I was lucky enough to get some early access to XAP 6.0, and have to say I've been very impressed. Congratulations to the Gigaspaces team on their achievement. The list above barely scratches the surface of what you can do with this technology. Think of processing units in the SBA as domain-specific entities and there's some rather elegant synergy with the DSL/LOP topics recently posted on InfoQ, with NFRs included. One area where I think all in this space (pardon the pun) would agree with Cameron, is that success is when grid-think becomes an implicit part of the architecture toolbox. It's time to shed the 'specialist' and 'high-end' tags the approach has historically had, and to see it as just a good way to provide a bit of capability back to the business.

  2. Back to top

    Oracle Coherence Data Grid

    Sep 25, 2007 3:57 PM by Cameron Purdy

    Obviously I don't want to take anything away from Gigaspaces' announcement, and congratulations to them on their recent product release. Despite our differences, there's no doubt that competition in this space has pushed all of us far beyond our own initial expectations. So please permit me to disagree a little ;-)

    Cameron's claim has always been that Tangosol is the best Data Caching technology because it has focused on doing just that: Data Caching. We believe it is now self-evident that solving the scalability, performance and availability of an entire application, end-to-end, not just the data bottleneck of that application, will require a lot more than just distributed caching functionality (even if it is called a Data Grid).
    I have a slightly different point of view, which is hardly surprising ;-) The truth is that Tangosol, a little company with no financial backing, was able to build a very successful product that solved a lot of important problems for a lot of customers, from small companies to many of the largest companies in the world. Back in 2001, we created the first coherent clustered cache product, Coherence, and it was extremely popular. In fact, it's so popular that even the Gigaspaces web site runs on it .. ;-) By 2003, we were working with some of the largest and most successful web sites in the world to use Java to achieve continuous availability by clustering the living state and transactional data of applications. Our customers pioneered the notion that information in memory could be of higher reliability and availability than any of the traditional data management choices that were available to them. Predictable scalability with low latency was a hallmark of these applications, from stock markets and banking systems to telco applications and ecommerce websites. It's clever to attempt to label it as "only" distributed caching, and we are definitely very successful in that space. We're also just as successful in the Data Grid space, which adds transactional data management to the grid environment, and when it comes to Event Driven Architectures (EDA) and eXtreme Transaction Processing (XTP), we still see no viable competition to Coherence.
    Oracle have realized it, that's why they bought Tangosol and are attempting to add caching to their Fusion Middleware stack that will also include their legacy application server and messaging technologies.
    If Oracle's plans for Coherence were to "attempt to add caching to their Fusion Middleware stack", we wouldn't have even considered it. It's true that Oracle is already extremely successful with Coherence in the marketplace -- it doesn't hurt to have an account manager dedicated to every major organization in the world! However, that isn't why Oracle selected Coherence as the technology to own in this space. The qualities of service (QoS) that Coherence provides are going to be the required building blocks for every new piece of infrastructure, for every service and for every major application from this point forward. The levels of availability and reliability that Coherence provides are simply unparalleled.
    .. it will always be fundamentally handicapped by the inherent lack of a common dynamic clustering model for scalability, performance and continuous-availability for all of the participating components.
    Just let me know when you catch up to the clustering model that we introduced in 2001 ;-) Peace, Cameron Purdy Oracle Coherence: The Java Data Grid

  3. Back to top

    Two very different approaches

    Sep 25, 2007 7:15 PM by Nikita Ivanov

    Having worked with these two products rather closely (our project integrates natively with both of them) I have a strange feeling. First of all, I truly respect them both. It’s not a b/s statement; both products are established, very complex and proven in the market place. Working in the similar space and business trade I can only appreciate what it took to get there… Now, on the surface they both solve somewhat similar problem: you can hear a lot about distributed heap, distributed caching, data grid, spaces, etc. However, what I found startling is that technological approaches are so different in these two products that even though they solve similar (if not the same problems) they do it VERY differently – thus driving very different reactions from different customers. Applicability and usage of these two products also varies dramatically. You can, by the way, safely add Terracotta to this mix. It is yet another product that solves the same type of problem (sort of) in a VERY different way again. We all have out biases based on our past experiences and preferences. I personally view these two products as very different (orthogonally different) approaches to the similar problem domain. No less – no more… My 2 cents, Nikita Ivanov. GridGain - Grid Computing Made Simple

  4. Back to top

    Xtreme Transaction Processing

    Sep 25, 2007 7:31 PM by Nati Shalom

    "Obviously I don't want to take anything away from Gigaspaces' announcement, and congratulations to them on their recent product release."
    Thanks, Cameron.
    "Despite our differences, there's no doubt that competition in this space has pushed all of us far beyond our own initial expectations. So please permit me to disagree a little ;-)"
    We're in agreement on that point ☺
    "I have a slightly different point of view, which is hardly surprising ;-)"
    Rather than speaking on our own behalf, I'd rather have others who used the product comment about it. For a start, look at the comment made by Julian Browne above:
    "The list above barely scratches the surface of what you can do with this technology. Think of processing units in the SBA as domain-specific entities and there's some rather elegant synergy with the DSL/LOP topics recently posted on InfoQ, with NFRs included."
    While I don't expect that we will share the same view on each other's products, I think that some clarity on what is XTP (eXtreme Transaction Processing) is in order. As an illustration, let's look at a typical transaction processing application, such as an order management system. It is typically built using the following components in a J2EE environment: 1. Data Feeds – typically from a JMS provider. 2. Message-Driven Bean - Where the business logic resides 3. Database - used for maintaining state to ensure recoverability as well as durability. Building highly-available transaction processing with this model requires: 1. JMS + JMS Cluster to maintain high availability. In some cases, JMS high-availability is achieved by writing the state of the messaging system to disk. 2. Application Server clustering to ensure high-availability 3. Database clustering 4. XA transactions to ensure ACID properties are kept among these different components. From what I hear from our customers and your own words, when you refer to XTP you're talking about enhancing (or partially replacing) the database with caching, or an extended version of caching called Data Grid. The equivalent of that in our product is the Enterprise Data Grid edition. When we at GigaSpaces talk about XTP we’re referring to the complete application stack, including the messaging feed, the container and the data. By doing that we’re targeting not just the data bottleneck, but the end-to-end application scalability, latency and complexity challenges. We do that by providing a JMS façade on top of the same cluster used for processing the business logic and the data. This way, we remove many of the moving parts and the multiple clustering models required with the previous approach, which is essentially tier-based. Needless to say, there is a huge difference between solving the data I/O bottleneck and the scalability of the entire application. It is, therefore, not surprising that users who recently evaluated the two technologies to address their XTP requirements realized quickly that on the XTP front, our products are really not comparable. The difference is so obvious, that we are getting requests from customers who are already using various caching alternatives, including Coherence, to support these caches so that they will be able to benefit from SLA driven container, Space Based JMS and the full GigaSpaces platform. This is something we’re seriously considering. For more information on what XTP really means refer to the following blog post on that topic. Nati S. GigaSpaces Write Once Scale Anywhere

  5. Back to top

    Re: Two very different approaches

    Sep 25, 2007 7:56 PM by Nati Shalom

    Hi Nikita

    We all have out biases based on our past experiences and preferences. I personally view these two products as very different (orthogonally different) approaches to the similar problem domain. No less – no more…
    Interestingly enough we share the same view.. I think that you had done great job integrating the different products with GridGain despite those differences, well done! Nati S.

  6. Julian - thanks for the kind words

    Think of processing units in the SBA as domain-specific entities and there's some rather elegant synergy with the DSL/LOP topics recently posted on InfoQ, with NFRs included.
    I gave a presentation on the latest SpringONE event in Brussle which covers some of the topics the Julian is referring too. You can find the online presentation here. That presentation triggered an interesting discussion on the Spring forum, End of App Servers?,  which highlights some of the points I was trying to make in this presentation:
    "Given the availability of all the various commodity services which an app server typically provides (connection pools, emailing, security, transactions, etc), there doesn't seem like much value-add provided by a war/ear/rar centric app servers any more. Being able to just start up another OSGi container on another box, and dynamically register OSGi components to it seems much more attractive, in terms of manageability, scale-out/fault-tolerance."
    If your interested in playing with it - i'll recommend starting by looking at the following webcasts

  7. Back to top

    Interview with GigaSpaces

    Sep 26, 2007 5:25 PM by Jesse Chan

    Since this press release, I have managed to get an exclusive interview with Geva Perry, Chief Marketing Officer at GigaSpaces. He talks more about the technology and what makes it different than other technologies such as caching, messaging, MapReduce, and so forth. GigaSpaces is not only powerful, but complementary to a lot of other technologies. You can read the interview here.

  8. Back to top

    Re: Xtreme Transaction Processing

    Sep 27, 2007 10:23 AM by Cameron Purdy

    Nati, As you know, scaling the stateless parts of the application has never been the problem, but I'm glad to hear you can solve it nonetheless .. ;-) In almost every real world case, the latency and throughput limiters on the stateful side -- including the transactional side -- is the obstacle to scale. These are the problems that Coherence solves, and despite your protestations to the contrary, these are the same problems that you are working to solve. Once again, a sincere congratulations on the release, and I hope you don't take our disagreements personally; they are not intended to be so. Peace, Cameron Purdy Oracle Coherence: The Java Data Grid

  9. Back to top

    Compute grid vs. Data grid

    Sep 27, 2007 10:51 AM by John Davies

    It's always healthy to see competition, I know both Nati and Cameron extremely well and I know they both have mutual respect for each other, the respect doesn't quite run as deep for each other's companies though. I have worked with both products, worked with their clients and even done talks on both products. Although there is a notable overlap, these are two different and distinct products fundamentally aimed, originally at least, at two different problems. One, Tangosol, who recently merged with Oracle :-), is a data grid and the other, GigaSpaces, is a compute grid. There are a number of situations where I can envisage both products working together but it still comes down to the fact that you could do most of what the other does by extending the scope of what it was originally designed for. Still, at the end of the day I can still see clear advantages in both products. I'm a huge fan of JavaSpaces, it has a beautifully simple API but for some bizarre reason it never really made it into the main stream despite Sun trying to help it succeed by introducing EJBs. Tangosol adopted the Hashmap as an API which made it easy to use but to be honest JavaSpaces' four methods with identical parameters isn't exactly difficult to master. The use of Spring in OpenSpaces brings JavaSpaces into the "standards" world where the programmer only has to learn one API. Despite Spring being an order or two more complex than JavaSpaces, it is pretty neat the way they've integrated it into 6.0 and it's easy to knock up demos. The is a huge market and increasing by the day, its great to see GigaSpaces where they are now with over 100 customers compared to a few years ago where I used to know them all by name and what version they were on. Congratulations GigaSpaces, a great achievement and just keep up the fight with Tangosol, they're worthy advocates. -John-

  10. Back to top

    Re: Xtreme Transaction Processing

    Sep 27, 2007 6:18 PM by Nati Shalom

    Hi Cameron

    As you know, scaling the stateless parts of the application has never been the problem, but I'm glad to hear you can solve it nonetheless .. ;-)
    This is probably the area where we probably differ most. My view is that there is no such thing as *stateless* tier when it comes to transaction processing or almost any distributed application that need to be high availability and reliability. For example if you send message through JMS, someone need to consume it at some other point in time, during that period of time the message becomes the *state* of the transaction. In addition to that there could be failure scenario's which requires that you will store that intermediate state to ensure full recovery of the message from the exact point it failed. The same applies to the transaction coordinator and your session information. Tests that was conducted by one of our partners proved that point beyond any doubt. In those tests they compared the Tier based approach where we used JMS+XA+Caching vs alternative that used our JMS facade,+ Spring as the abstraction layer and Caching all running on our virtualized XAP middleware. Those tests showed that with the Tier based approach the JMS tier with its own clustering overhead as well as the transaction coordinator had huge impact on the end-to-end latency, scalability and complexity. One of reason is pretty obvious, to ensure high availability both had to maintain their state in the file system to ensure full recoverability. With this architecture we ended up with the following message flow with the tier based alternative: 1. Send message to the JMS 1.2the JMS write to disk (for its own high availability) 3. MDB takes the message under transaction (with the additional transaction overhead) 4. That message is written to the Cache (under transaction ) the cache replicate that data to backup instance (again for its own recoverability) Any access to disk during the critical path of the business-transaction limit the throughput and latency as you know - so no matter how fast you are with the data-tier your end-to-end scaling, latency and complexity is going to be determined by your weakest link. In our case you simply send the message to the JMS facade and that 's translate immediately as an entry in the our DataGrid. See details here Another area of complexity is affinity. With the tier based approach you need to make sure that messages are routed to the appropriate queue which contains the appropriate caching partition. You need to do that explicitly since both messaging and data clusters uses different clusters load balancing model etc. In our case there is a single clustering/fail-over semantics for the entire application as well as development and deployment model. This makes that challenge pretty much irrelevant. You can imagine what will happen if you add scaling to the picture with the tier based approach. Bottom line: My point is that if you need true linear scaling (as appose to performance optimization) you can't assume anything about the other tiers. You need to be able to handle the end-to-end transaction flow and not just part of it. IMO this is the only way to achieve true linear scalability. The real good news is that with our latest release we added specific abstractions at the messaging (JMS, Event Containers) and Data, (DAO, Declarative Transactions) that makes the transition from the non scalable Tier-Based-Model to linearly Scale-Out model relatively seamless. If you happen to use Spring this will fit-in as a very native extension to your existing development and runtime environment. In this way you can scale your entire application seamlessly just by changing the runtime platform. If your already using another caching alternative you can benefit from our messaging, and SLA driven container to scale out your entire application. Nati S. GigaSpaces Write Once Scale Anywhere

  11. Back to top

    Re: Compute grid vs. Data grid

    Sep 27, 2007 6:42 PM by Nati Shalom

    Hi John Thanks for the feedback.

    One, Tangosol, who recently merged with Oracle :-), is a data grid and the other, GigaSpaces, is a compute grid.
    To be more accurate we position ourself as a platform for scaling-out stateful application. Were positioning the parallel processing part that comes natively with JavaSpaces thorough the Master/Worker pattern for parallel transaction processing as appose to parallelization of batch processing. There is fundamental difference between the two, the later tend to be low-latency and stateful in nature and often requires very simple parallel processing pattern where batch processing are more stateless in nature and requires more sophisticated parallel processing and resource schedulring based on different policies to improve the utilization and reduce compute time on Data Center.
    There are a number of situations where I can envisage both products working together but it still comes down to the fact that you could do most of what the other does by extending the scope of what it was originally designed for. Still, at the end of the day I can still see clear advantages in both products
    I can share with you that one of the leading investment banks chose to use GigaSpaces and Coherence partially because of the reasons i mentioned above. These are interesting days:)
    Congratulations GigaSpaces, a great achievement..
    Thanks John, Stay tuned there are more to come.

Exclusive Content

Tapestry for Nonbelievers

A new article by I. Drobiazko and R. Zubairov introduces v. 5 of the Apache Tapestry component-oriented web framework. The tutorial shows how to create a component and covers IoC in Tapestry and Ajax.

Pete Lacey on REST and Web Services

In this interview, Burton Group consultant Pete Lacey talks to Stefan Tilkov about his disillusionment with SOAP, his opinion on REST, and addresses some of the perceived shortcomings REST vs. WS-*.

Business Natural Languages Development in Ruby

Jay Fields presents his concept of Business Natural Languages - a type of Domain Specific Languages geared towards being readable by domain experts.

Distributed Version Control Systems: A Not-So-Quick Guide Through

Adoption and interest for Distributed Version Control Systems is constantly rising. We will introduce the concept of DVCS and have a look at 3 actors in the area: git, Mercurial and Bazaar.

Segundo Velasquez and Agile as Seen Through the Customer's Eyes

Deborah Hartmann interviewed Segundo Velasquez about his experience as customer with an Agile team during the initial phase of software design of a product.

Fine Grained Versioning with ClickOnce

David Cooksey shows how to fine grained versioning to a ClickOnce deployment using an HttpHandler written with ASP.NET, making partial rollouts to a test audience much easier.

Implementing Manual Activities in Windows Workflow

Windows workflow (WF) is an excellent framework for implementing business processes, but lacks support for human activities. This article describes a completely generic approach for changing this.

Markus Voelter about Software Architecture Documentation

In this interview taken during OOPSLA 2007, Markus Voelter talks about the importance of documenting the software architecture, and gives some good and also bad examples on how it could be done.