Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Spring Batch 2.0 Supports Job Partitioning and Annotation Based Configuration

Spring Batch 2.0 Supports Job Partitioning and Annotation Based Configuration

Leia em Português

This item in japanese

The latest version of Spring Batch framework supports job partitioning, remote chunking and annotation based configuration. Spring Batch development team recently released version 2.0 of the batch framework. The new version also has features like Java 5 support and non-sequential execution.

The new features in this version are grouped into four main categories: Java 5 support, Non-Sequential Execution, scalability enhancements and Annotations.

Java 5 Support:
Spring Batch 2.0 version supports Java 5 generics and parameterized types to allow compile time checks for type safety. For example, ItemReader interface now has a typesafe read method.

Non-sequential execution:
This includes three new features (conditional, pause and parallel execution) to allow non-linear sequence of steps which lets a job finish even if one of the steps in the job fails. Conditional execution helps in branching the job to a different step based on the ExitStatus of the last one. This includes the ability to branch on a FAILED status, which implies that a step failure is no longer fatal for a job. Pause execution can be used to stop and wait for explicit instruction to proceed. This is useful for instance where there is a business rule that forces manual intervention to check the validity of business critical data. And the parallel execution of multiple steps, where steps are independent to each other and the user can specify which branches can be executed in parallel.

The new version supports parallel execution in multiple processes with two approaches: remote chunking, and partitioning. Remote chunking is a technique for dividing up the work of a step without any explicit knowledge of the structure of the data. Any input source can be split up dynamically by reading it in a single process and sending the items as a chunk to a remote worker process. The remote process implements a listener pattern, responding to the request, processing the data and sending an asynchronous reply. The transport for the request and reply has to be durable with guaranteed delivery and a single consumer, and those features are readily available with any JMS implementation. Spring Batch is building the remote chunking feature on top of Spring Integration, so it is agnostic to the actual implementation of the message middleware.
Partitioning is an alternative approach which in contrast depends on having some knowledge of the structure of the input data, like a range of primary keys, or the name of a file to process. The advantage of this model is that the processors of each element in a partition can act as if they are a single step in a normal Spring Batch job. They don't have to implement any special patterns, which makes them easy to configure and test. Partitioning is more scalable than remote chunking because there is no serialization bottleneck arising from reading all the input data in one place. In Spring Batch 2.0 partitioning is supported by two interfaces: PartitionHandler and StepExecutionSplitter.

Annotation Based Configuration:
Spring Batch components like reader, writer, processor, listeners can now be configured using the Annotations and plugged into a step. This is done using the XML namespace for Spring Batch.

There are also some changes in the area of application monitoring which include the statistics for counting and accounting of items executed and skipped, splitting out counts for total items read, processed and written at each stage. For steps (or tasklets) that do not split their execution into read, process, write, this is more comprehensive than is needed, but for the majority use case it is more appropriate than just storing an overall item count.

SpringSource is planning an Enterprise Batch product that will provide a full run-time solution for partitioning and remote chunking, as well as admin and scheduling concerns. Their future roadmap includes the plan to add a Spring 3.0 dependency for Spring Batch 2.1 (while keeping the option of 2.5.6). This provides the new features in particular in the configuration of jobs and steps using late binding with Spring Expression Language (EL), which has the same syntax as for late binding in Spring Batch 2.0, but has more features and is more flexible than the current version.

Rate this Article