SOA Agents: Grid Computing meets SOA
SOA has made huge progress in recent years. It moved from experimental implementations by software enthusiasts to the main stream of today's IT. One of the main drivers behind such progress is the ability to rationalize and virtualize existing enterprise IT assets behind service interfaces which are well aligned with an enterprise business model and current and future Enterprise processes. Further, SOA progress was achieved through the introduction of the Enterprise Service Bus - a pattern for virtualization of a services infrastructure. By leveraging mediations, service location resolution, service level agreement (SLA) support, etc., ESB allows software architects to significantly simplify a Services infrastructure. The last piece missing in the overall SOA implementation is Enterprise Data Access. A possible solution for this problem, Enterprise Data Bus (a pattern for ubiquitous access to the enterprise data), was introduced in [1,2]. EDB adds the third dimension to SOA virtualization, which can be broken down as follows:
- Services - virtualization of the IT assets;
- ESB - virtualization of the enterprise services access
- EDB - virtualization of the enterprise data access.
In other SOA developments, several publications [3,4,5] suggested the use of Grid technology for improving scalability, high availability and throughput in SOA implementations. In this article, I will discuss how Grid can be used in the overall SOA architecture and introduce a programming model for Grid utilization in services implementation. I will also discuss an experimental Grid implementation that can support this proposed architecture.
SOA with EDB - overall architecture
Following [1,2] the overall architecture of the SOA with EDB is presented in Figure 1.
Figure 1: SOA architecture with the Enterprise Data Bus
Here, ESB is responsible for proper invocation of services, which are implemented by utilizing EDB to access any enterprise data  which might be required by those services. This architecture provides the following advantages:
- Explicit separation of concerns between implementation of the service functionality (business logic) and enterprise data access logic.
Enterprise Data Bus effectively creates an abstraction layer, encapsulating details of enterprise data access and providing "standardized interfaces" to the services implementations.
- EDB provides a single place for all of the transformations between enterprise semantic data models used by services  and data models of enterprise applications by encapsulating all of the access to enterprise data.
As a result, service implementations can access enterprise data using a SOA semantic model, thus significantly simplifying the design and implementation of enterprise services.
- Service implementations having access to the required enterprise data provided by the EDB allows for significantly simplified service interfaces and provides a looser coupling between service consumer and provider:
- Because the service (consumer) can directly access data  , the service invocation, for example, does not require the actual values of parameters (input/output) to be sent as part of a service invocation. As a result, the service interface can be expressed in terms of data references (keys) instead of actual data.
- While an enterprise service model will evolve as the SOA implementation matures, the changes in the data reference definitions will rarely change. As a result, service interfaces based on Key data are more stable.
- Extending service implementations to use additional data can be implemented without impacting its consumers.
Adding a Grid
One of the possible implementations the EDB is the use of a data grid such as Websphere eXtreme Scale, Oracle Coherence Data Grid, GigaSpaces Data and Application Grid or NCache Distributed Data Grid.
Data Grid is a software system designed for building solutions ranging from simple in-memory databases to powerful distributed coherent caches that scales to thousands of servers. A typical data grid implementation partitions data into non-overlapping chunks that are stored in memory across several machines. As a result, extremely high levels of performance and scalability can be achieved using standard processes. Performance is achieved through parallel execution of updates and queries (different parts of data can be accessed simultaneously on different machines) while scalability and fault tolerance can be achieved by replicating the same data on multiple machines.
Figure 2 shows the use of a Grid as an EDB implementation. The Grid maintains an in-memory copy of the enterprise data, which represents the state of enterprise databases and applications.
Figure 2 Grid as an EDB
The introduction of Grid allows the repartitioning of data that exists in multiple databases and applications so that it adheres to the enterprise semantic model. This entails bringing together logically related data from different applications/databases in the enterprise into one cohesive data representation along with rationalizing the duplicate data which will inevitably exist in the enterprise.
Grid implementations are typically supported by a publish/subscribe mechanism, allowing data changes to be synchronized between Grid memory and existing enterprise applications and data. A Grid-based intermediary allows for very fast access to the enterprise data using a data model optimized for such a service usage.
Although Grid-based EDB (Figure 2) simplifies high speed access to the enterprise data, it still requires potentially significant data exchange between the EDB and the service implementation. A service must load all the required data, execute its processing and then store results back to the Grid.
A better architecture is to bring execution of the services closer to the enterprise data; implement services as coordinators of the agents , which are executed in the memory space containing the enterprise data (Figure 3). A service implementation, in this case, receives a request and then starts one or more agents, which are then executed in the context of Grid nodes, returning the results to the service implementation, which then combines the results of agents' execution and returns the service's execution result.
Figure 3 Service as an agent's coordinator
This approach provides the following advantages over the Publish/Subscribe data exchange model:
- It allows for local data manipulation that can significantly improve overall service execution performance, especially when dealing with large amounts of data (megabytes or even gigabytes of data).
- Similar to the data partitioning, the actual execution is partitioned between multiple Grid nodes, thus further improving performance, scalability and availability of such an architecture.
- Because all services can access the same data, when service execution involves purely the manipulation of data with minimal request/replies, there is no need to pass data at all.
The concept of an agent can be traced back to the early days of research into Distributed Artificial Intelligence (DAI) which introduced the concept of a self-contained, interactive and concurrently-executing object. This object had some encapsulated internal state and could respond to messages from other similar objects. According to , "an agent is a component of software and/or hardware which is capable of acting exactingly in order to accomplish tasks on behalf of its user."
There are several types of agents, as identified in :
- Collaborative agents
- Interface agents
- Mobile agents
- Information/Internet agents
- Reactive agents
- Smart Agents
Based on the architecture for the service implementation (Figure 3), we are talking about agents belonging to the multiple categories:
- Collaborative - one or more agents together implement the service functionality.
- Mobile - agents are executed on the Grid nodes outside of the service context
- Information - agents' execution directly leverages data located in the Grid nodes.
In the rest of the article we will discuss a simple implementation of Grid and a programming model that can be used for building a Grid-based EDB and an agent-based service implementation.
Among the most difficult challenges in implementing Grid are High Availability and Scalability and data/execution partitioning mechanisms.
One of the simplest ways to ensure Grid's High Availability and Scalability is the use of messaging for the internal Grid communications. Grid implementations can benefit from both point-to-point and publish-subscribe messaging:
- Usage of messaging in point-to-point communications supports decoupling of Consumers from Providers. The request is not sent directly to the Provider, but rather to the queues monitored by the Provider(s). As a result, queuing provides for:
- Transparently increasing the overall throughput by increasing the number of Grid nodes listening on the same queue.
- Simple throttling of Grid nodes' load through controlling the number of threads listening on a queue.
- Simplification of the load balancing. Instead of the consumer deciding which provider to invoke, it writes the request to the queue. Providers pick up requests as threads become available for request processing
- Transparent failover support. If some of the processes listening on the same queue terminate, the rest will continue to pick up and process messages.
- Usage of publish/subscribe messaging allows for the simplification of "broadcast" implementations within the Grid infrastructure. Such support can be extremely useful when synchronizing changes within a Grid configuration.
Depending on the Grid implementation, data/execution partitioning approaches can range from pure load-balancing policies (in the case of identical nodes) to dynamic indexing of Grid data. This mechanism can be either hard-coded in the Grid implementation or externalized in a specialized Grid service-partition manager. The role of partition manager is to partition Grid data among nodes and serves as a "registry" for locating nodes (nodes queues) for routing requests. Externalization of a partition manager in a separate service introduces additional flexibility into an overall architecture through the use of a "pluggable" partition manager implementation or even multiple partition managers, implementing different routing mechanisms for different types of requests.
The overall Grid infrastructure, including partition manager and Grid nodes communication can be either directly exposed to the Grid consumer in the form of APIs, used as part of a Grid request submission or encapsulated in a set of specialized Grid nodes - Grid masters (controllers). In the first case, a specialized Grid library responsible for implementation of request distribution and (optionally) combination of replies has to be linked to a Grid consumer implementation. Although this option can, theoretically, provide the best possible overall performance, it typically creates a tighter coupling between Grid implementation and its consumers . In the second case, Grid master implements a façade pattern  for the Grid with all advantages of this pattern - complete encapsulation of Grid functionality (and infrastructure) from the Grid consumer. Although implementation of Grid master adds an additional networking hop (and consequently some performance overhead), the loose coupling achieved is typically more important.
Overall high level Grid architecture supporting two level - master/nodes implementation is presented at Figure 4.
Figure 4 Grid High Level Architecture
In addition to components, described above, proposed architecture (Figure 4) contains two additional ones - Grid administrator and code repository.
GRID administrator provides a graphical interface, showing currently running nodes, their load, memory utilization, supported data, etc.
Because restarting of Grid nodes/master can be fairly expensive  we need to be able to introduce new code into Grid master/nodes without restarting them. This is done through usage of code repository - currently implemented as Web accessible jars collection. As developers implement new code that they want to run in Grid environment, they can store their code in a repository and dynamically load/invoke it (using Java
URLClassLoader) as part of their execution (see below).
In order to simplify the creation of applications running on the Grid we have designed a job-items programming model (Figure 5) for execution on the Grid. This model is a variation of Map/Reduce  pattern and works as follows:
Figure 5 Job Items model
- Grid consumer submits job request (in the form of job container) to the Grid master. Job container provides all of the necessary information for the master environment to instantiate the job. This includes job's code location (location of the java jar, or empty string, interpreted as local, statically linked code), job's starting class, job's input data and job's context type, allowing the choice between multiple partitions managers for splitting job execution.
- Grid master's runtime instantiates job's class passing to it the appropriate job context (partition manager), and replier object, supporting replies back to the consumer. Once the job object is created, runtime starts its execution.
- Job's start execution method uses partition manager to split the job into items, each of which is send to a particular Grid node for execution - map step.
- Each destination Grid node receives an item execution request (in the form of item container). The Item container is similar to the job container and provides sufficient information for the Grid node to instantiate and execute item. This includes item's code location, item's starting class; item's input data and item's context type.
- Grid node's runtime instantiates an item's class, passing to it the appropriate item context and replier object, supporting replies back to the job. Once the item object is created, runtime starts its execution.
- The Item's execution uses a reply object to send partial results back to the job. This allows job implementation to start processing an item's partial results (reduce steps) as soon as they become available. If necessary, additional items can be created and sent to the Grid nodes during this processing.
- The Job can use a replier to send its partial results to the consumer as they become available
The overall execution looks as follows (Figure 6)
Figure 6 Job Items execution
Detailed execution for both Grid master and node is presented at Figure 7
Figure 7 Execution details
In addition to implementing Map/Reduce pattern, this programming model provides support for fully asynchronous data delivery on all levels. This not only allows significantly improved overall performance when job consumers can use partial replies, (For example: delivering partial information to the browser) but also improve the scalability and throughput of the overall system by limiting the size of the messages (message chunking) .
Use of a job container as a mechanism for job invocation also allows a standardized interface for submitting jobs to the Grid  (Figure 8). We are providing 2 functionally identical methods for this web service interface - invokeJobRaw and invokeJobXML.
Figure 8 GridJobService WSDL
Both methods allow invocation of the job on the Grid. The first approach uses MTOM to pass a binary-serialized JobContainer class, while the second one support XML marshalling of all elements of the JobContainer (Figure 5). In addition to the JobContainer, both methods pass two additional parameters to the Grid:
- Request handle allowing to uniquely identify request and is used by consumer to match replies to a request (see later)
- Reply URL - a URL at which consumer is listening for reply. This URL should expose GridJobServiceReplies service (Figure 9)
Figure 9 Grid Job Service Reply WSDL
Implementation of Grid master
The class diagram for Grid master is presented in Figure 10. In addition to implementing the basic job runtime described above, the Master's software also implements basic framework features including threading , request/response matching, requests timeout, etc.
In order to support request/multiple replies paradigm for items execution, instead of using "get Replies with wait" (a common request/reply pattern when using messaging), we decided to use a single listener and build our own reply matching mechanism. Finally, we have implemented a timeout mechanism, ensuring that the job is getting the "first" reply from every item within a predefined data interval (defined in the job container).
Figure 10 Grid master implementation
Implementation of Grid node
The class diagram for Grid node is presented at Figure 11. Similar to the master runtime, here we complement basic item's execution with the framework support including threading, execution timeout, etc.
Figure 11 Grid node implementation
To avoid stranding of node resources by items running forever, we have implemented items eviction policy, based on the item's execution time. An execution of an item, running longer then the time advertised by it (in the item container), will be terminated and timeout exception will be sent back to the job.
Grid consumer framework
We have also developed a consumer implementation, wrapping Web services (Figure 8, Figure 9) with a simple Java APIs (Figure 12) It leverages embedded Jetty Web server and allows to submit a job request to a Grid and register a callback for receiving replies.
Figure 12 Grid consumer
Introduction of the EDB allows architects to further simplify SOA implementation by introducing "standardized" access from services implementation to the enterprise data. It simplifies both service invocation and execution models and provides for further decoupling of services. The use of Grid for EDB implementations supports the EDB's scalability and high availability. Finally, use of service agents executing directly in the Grid further improves scalability and performance. Grid's high level architecture and programming model, described in this article, provides a simple yet robust foundation for such implementations.
Many thanks to my coworkers in Navteq, especially Michael Frey, Daniel Rolf and Jeffrey Herr for discussions and help in Grid and its programming model implementation.
1. B. Lublinsky. http://www.infoq.com/articles/SOA-enterprise-data">Incorporating Enterprise Data into SOA. November 2006, InfoQ.
2. Mike Rosen, Boris Lublinsky, Kevin Smith, Mark Balcer. Applied SOA: : Service-Oriented Architecture and Design Strategies. Wiley 2008, ISBN: 978-0-470-22365-9.
3. Art Sedighi. Enterprise SOA Meets Grid. June 2006.
4. David Chappell and David Berry. SOA - Ready for Primetime: The Next-Generation, Grid-Enabled Service-Oriented Architecture.A SOA Magazine, September 2007.
5. David Chappell. Next Generation Grid Enabled SOA.
6. Data grid
7. Hyacinth S. Nwana. Software Agents: An Overview
9. Map Reduce.
 A source of the enterprise data (Figure 1) can be either a database or an existing enterprise application. Consequently an EDB can be implemented as both database access and integration layer synchronizing services data with existing enterprise applications/systems.
 The exception of this rule might be the ultimate service client, for example servlet, delivering results of service execution to the user at the browser.
 Higher level of coupling in this case is due to two major reasons – first, parts of Grid specific code are running directly in the Grid consumer implementation, which requires rebuilding/restarting consumers for every change of this code, second, consumer, in this case, is required to directly support network protocols used by Grid nodes communications.
 Due to the amount of cashed data
 Usage of the message chunking is a "standard" technique for dealing with the large messages. It allows to diminish both the amount of memory required for processing of the last messages and network latency, by avoiding sending extra large messages.
 Although Web Service interface presented at (Figure 8) has synchronous interface, the job invocation is asynchronous. The result, is return to the job consumer after invocation of startExecucution method by the job, not after results has been received by the job. Usage of synchronous (request/reply) invocation in this case allows to deliver any exception occurring during job initiation and startup to the job consumer
 Many messaging solutions directly support threading. Typically every message listener starts (and executes) in its own thread. As a result, often, controlling the amount of message listeners allows to control jobs execution threading. Unfortunately this support differs between messaging implementations. As a result we decided not to leverage messaging threading, but use a single listener and roll out our own threading pool.