Coherence 3.2 Enhances Clustered Data Client and Grid Computing Support
- Coherence Data Client: Enterprise-wide access to services provided by the data grid.
- Coherence Real Time Client: Real-time access to data feeds, including near-caching of data on the client as well as continuous query caching.
- Coherence Compute Client: Optimized for data-intensive compute grid nodes as well as transaction intensive application servers.
Among the other features version 3.2 adds:
- Increased functionality through grouping and composite parallel aggregators, conditional data grid processors, and Once And Only Once guarantees;
- Extended TCP/IP support with the Coherence Real Time Client and a path to direct access and interoperability for .NET and C++;
- Improved performance and resiliency through a network layer hyper-optimized for multi-core, Gigabit Ethernet, InfiniBand and remote GC detection, full peer-to-peer flow control and higher resiliency;
- Write-Behind caching improvements that insure no database impact during failover and load-balancing; and
- WebSphere 6.0 support, plus up-to-date certification on all IBM platforms
InfoQ sat down with Tangosol President Cameron Purdy to discuss the new release. Tangosol describes their product as not just a caching engine but a "Data Grid Solution Set". Purdy explained their customers have always used Coherence for more than just caching. Common activities include real-time analytics, large-scale Transaction Processing (TP), and Complex Event Processing (CEP). The different client connectors were created to address these varying needs. Customers can leverage the same data grid for different problems using the clients.
Comparing the different client options in detail Purdy explained:
The three clients are the Data Client, Real-Time Client and Compute Client. The Data Client provides stateless desktop and server access to the data grid; with a single connection, it can have full access to all of the data and services on the data grid. The Real-Time Client provides a stateful connection, with instantaneous events and projection of data-of-interest from the Data Grid directly onto the desktop in real time. Desktops can subscribe to real time data feeds, ensuring first-class access to live data managed by the Data Grid. Both of these clients connect to the data grid via TCP/IP or JMS. The third client connectivity option is the Compute Client, which provides real-time, server-class access to the Data Grid for application servers and compute grid nodes. It uses the same server-mesh connectivity as the Data Grid itself, giving it massively scalable network throughput, bare-metal latency and full management and monitoring capabilities.
Previously InfoQ interviewed Jonas Bonér of Terracotta about their technique of clustering the JVM during which he responded:
"“We are looking at solving developer problems first by simplifying the programming model. In data grid solutions developers must first retrieve a bean from the grid in order to modify it. They must then set it back into the grid to propagate the changes. Maintaining these relational maps between the objects perturbs the domain model by forcing you to layer some sort of primary-foreign-key relationships on top of your domain model. This forces Java developers to start thinking like relational database designers, something that is very unnatural to most developers. They simply want to be able to change a bean and the system worries about the rest. We preserve common Java programming characteristics such as pass-by-reference and garbage collection across the cluster.“
Purdy was asked if he thought clustering/caching could be handled simply as a transparent JVM option or if there was a need to create additional programming patterns to provide grid enablement:
We’ve definitely been down that path. I’d say that those comments do reflect many of the use cases that our customers were using Coherence for, four or five years ago.
Today, a Coherence Data Grid never has to move the data in order to work on it; in fact, with a single line of code, a developer can execute a query, scalar or aggregate functions, or even a custom processing agent, and those operations will be executed in parallel across a thousand servers against terabytes of data, yet without any of that data being moved – and without a single disk access!
To be clear, Terracotta provides neither clustered caching features nor Data Grid features – their product replicates pre-defined object graphs, using byte-code manipulation and a central socket server to manage all the locks and data.
Coherence has no central server. Coherence has no predefined object graph. Coherence does not rely on replication. Coherence is used in production applications with high transaction rates – many thousands of transactions per second – completely precluding the use of distributed pessimistic locking, let alone centralized locking! These applications also require high levels of concurrency, in which having the same “Java object identity” across two parallel transactions would defeat transactional isolation. Coherence supports multiple computer languages and programming platforms, so it is not possible to pretend that the data are all just Java objects with Java semantics. In short, Coherence must deal with massive concurrent read/write loads on data that is collected from and distributed across many applications of various origins across an enterprise. Other solutions do not and cannot solve the enterprise-class problems that Coherence is designed to solve.
The conversation shifted to asking Purdy about the new features in 3.2. He explained that the company is constantly refining their solutions and processes. As a result they will analyze, review and refine existing subsystems while continuing new development. In a typical release, the result can be the replacement of entire subsystems while maintaining backwards compatibility. In version 3.2 major portions of the clustering technology were replaced adding in the ability to detect and react to remote garbage collections within milliseconds. The system will acheive the same theoretical maximum throughput on a gigabit network as a ten megabit network through automatic tuning that maximizes throughput and minimizes packet loss. The TCP/IP client/server technology is also new in this release with benchmarks supporting over 65,000 concurrent connections on a single commodity blade server.
Purdy noted that one of the coolest new features in his opinion is the Portable Invocation Format (PIF) and Portable Object Format (POF) transparent cross-platform and cross-language support:
[This] will ultimately enable C++, C#, Ruby, Python, VB, Excel, PHP, Perl and just about any other language to access and manipulate data in a language-specific manner, yet have that live data shared across components, applications and services built in different languages. Imagine “new-ing” a C++ or C# object, injecting it into a Data Grid, and having Java clients automatically receive an event in real time containing the Java form of that new object! Already, a number of our customers are working with our Coherence.NET product that supports the Microsoft .NET platform and languages, and we will be announcing general availability for it this quarter.
Next Purdy commented on the continued development of the grid computing aspect of Coherence:
Tangosol Coherence is the Data Grid. I have often explained that when we first released Coherence, there were no terms to describe it, so we made up the term “clustered caching,” and that’s the term we used to describe what it did. Basically, when an application is running on many servers, Coherence makes the application’s live, in-memory data available to the application on every server. Sometimes this is called a Single System Image, sometimes it’s referred to as an in-memory clustered database, and sometimes it’s called an information fabric. At the end of the day, Coherence just manages and makes available live data to lots of servers. Sometimes Coherence is used for caching, sometimes for On-Line Transaction Processing (OLTP), sometimes for analysis and sometimes for Complex Event Processing (CEP). Yes, it’s true that Coherence is by far the most successful distributed caching product ever, but it’s also the most scalable Java Transaction Processing system and the most wide-spread analytical and event-based Data Grid solution.
When we think about “grid” and “virtualization” technology and the movement towards Service Oriented Architectures (SOA), we have a single, clear and coherent vision. Applications, services and components all need to be dynamically deployable, relocatable and scalable. Their functionality and the data that they depend on must all be continuously available, surviving failures of servers, networks and even data centers. Their capacity needs to be dynamically manageable, growing to meet the demands of the business. When technologists talk about grid and virtualization and service oriented architectures, they are really talking about the same thing: Resilient, highly-available, completely reliable, easily manageable and serviceable IT infrastructure. That is the vision that we are delivering on today, but it’s also the same vision that originally inspired us to provide the industry’s first solution for clustered caching.
If you read the various press journals and magazines, you’ve probably seen a number of stories of major companies announcing SOA, grid and virtualization efforts, and publicly naming Tangosol Coherence as a central part of that strategy. Just off the top of my head, I can think of stories with Wachovia, Wells Fargo, Putnam Investments, Starwood Hotels, GEICO Insurance, Atlassian and others that illustrate the realization of this vision, today.
Purdy went on to mention they will continue to work with partners such as Spring (which supported Coherence in their 2.0 release), DataSynapse, BEA (integration with their JEE servers and OpenJPA), and IBM to test and integrate their products in an effort to make them as easy to use as possible. Work has also been done with Azul to support their network attached processing appliance and Mellanox to optimize network latency and throughput an up to twenty gigabits per second.
Finally Purdy was asked where he sees Coherence going in future releases. Similar to how data access services were opened up as an enterprise-wide service in this release, he said that the grid in general will be come an enterprise-wide service with support for dozens of different languages and platforms in the next year or so. Tangosol will also continue to focus on simplifying the development model for large-scale, dynamic, continuously available systems:
We need to constantly be thinking about how to make software more understandable, more reliable, more maintainable, more easily changeable. ... [we need to make a cluster] as easy to manage as a two server Data Grid, as flexible to develop with as Eclipse and IDEA, as reliable as a mainframe, as secure as a bank vault and as accessible as a web site.
One more thing ..
As an industry, we all need to keep pushing and encouraging the major vendors to address energy usage and other environmental factors. Having spent more than a few hours stuck inside a data center or two, I do hope for a future with whisper-quiet data centers and passively cooled servers.
Tangosol Coherence: Clustered Shared Memory
The importance of .NET