BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Parallel Processing Framework JPPF offers Load Balancing, Failover and J2EE Integration

Parallel Processing Framework JPPF offers Load Balancing, Failover and J2EE Integration

Java Parallel Processing Framework (JPPF) project team announced the first Release Candidate (RC1) of Version 1.0 of their product last week. JPPF is an open source grid computing framework that can be used to run multiple java applications in parallel in a distributed execution environment.

JPPF architecture consists of three main components called clients, servers and nodes. The principle behind how the framework works is that it takes in a number of tasks, distribute their execution over several nodes, and after the execution of all the steps, recompose the results to send back to the client.

JPPF also provides the services such as load balancing, failover, and recovery. A JMX based administration console allows the monitoring of the nodes as well as the executed tasks. Tasks can be cancelled and restarted remotely, or they can be configured to timeout at a given date or after a given elapsed time.

The framework integrates with J2EE application servers using a JCA 1.5 compliant Resource Adapter which provides the servers with an access to native grid services. The Resource Adapter implements asynchronous tasks submission to eliminate any risk of JTA transaction timeouts. JPPF supports the following application servers:

InfoQ caught up with Laurent Cohen, founder of the framework, about JPPF parallel processing capabilities and the future roadmap of the project. Speaking of Version 1.0 GA release, Laurent said his team is planning on the GA release for next month.

Responding to a question on how JPPF framework compares with java.util.concurrent API introduced in JDK 1.5, Laurent said that in an environment where there is a single computer with many processors, JPPF will be measurably slower than using java.util.concurrent classes directly. But if the architecture is a network of machines, JPPF is a good solution compared to JDK concurrency classes. JPPF uses java.util.concurrent APIs internally in every component of its architecture. Its nodes are configured to perform multithreaded processing of the tasks using the ExecutorService interface.

A key feature that will be introduced in Java 7 as part java concurrency API is the fork-join framework used for fine-grained parallel processing requirements. InfoQ asked if there are any similar features available in JPPF. Here is his response:

This is what JPPF has been all about from the start. JPPF is designed to take in a number of tasks, distribute their execution over the compute nodes, then recompose the results into the appropriate format. In this regard, JPFF can be viewed as an extended, distributed fork-join framework.

Responding to a question on how JPPF compares with other open source parallel computing frameworks such as GigaSpaces, Terracotta and GridGain, Laurent said:

GridGain is the closest open source framework to JPPF in terms of scope and functionalities. What differentiates them is their implementation architectures: GridGain uses a peer-to-peer topology whereas JPPF uses a multi-tiered architecture to achieve the distributed processing.

Terracotta has a very different philosophy. Their implementation of a distributed JVM is an extraordinary achievement, however that doesn't make it a grid computing framework per say. Terracotta is great at clustering and provides vital features such as distributed caching, transaction management, replication etc.

JPPF fits well into these frameworks where JPPF would implement the local topology within single organizations while GigaSpaces or Globus Toolkit managing the larger picture.

Regarding the implementation details of load balancing in JPPF, he explained that:

The applications submit tasks to a centralized JPPF server, grouped as a "task bundle". Based on the nodes' performance profile - computed dynamically - the bundle is then partitioned into several sub-bundles that are sent to each node. The size of each sub-bundle is computed as a function of the past performance of the node. The performance profiles are constantly re-evaluated, causing the framework to automatically adapt to new and changing conditions, including the type and number of tasks sent by the applications, the number of nodes actually registered with the server, etc.

And regarding the failover capability:

Failover is implemented in all JPPF architecture components and relies on 3 major mechanisms: dynamic topology, failure detection and automatic resubmission. JPPF components can be added to, or removed from the network at any time and in any order.

Laurent talked about 2 typical cases of failure and how JPPF fails over to a different node in these scenarios.

In the first case, a node crashes or suddenly gets disconnected from the server. The server will detect the failure and automatically submit the incomplete work to another node.

In the second case, the client is disconnected from the server. The client will automatically attempt to reconnect to the server, until it succeeds or an optionally specified timeout expires. In the meantime, a client can be configured to connect to many servers, organized in a hierarchy of server connection pools that defines an effective failover strategy. The client will then resubmit the work to the next server in the hierarchy.

Responding to a question on how JPPF framework supports application level security, he said JPPF will use any security framework that's used in the application, in a transparent way. Also, JPPF nodes have a configurable security policy which defines what the client code can and cannot do on node's host (such as writing to / reading from the file system, opening connections to other servers etc).

Finally, speaking of what's coming in the future releases of JPPF framework, Laurent said that there will be integration with Business Rules Engines (such as ILOG Inc. and JBoss Rules) and Web Services. It will also integrate with tools in the areas of ETL, Business Intelligence (BI), and Data Mining where distributed processing plays a critical role in retrieving data against large sets of data stored in a data warehousing system.

JPPF project was started 2 years ago as part of SourceForge.net. It's currently licensed under Apache License Version 2.0 and the latest version of JPPF can be downloaded from SourceForge project website.

Rate this Article

Adoption
Style

BT