Spring Batch: Simplified Development of Batch and Offline Processes
The Spring Batch project, a lightweight and comprehensive Spring-based batch framework, released version 1.0 recently. InfoQ spoke with project lead David Syer to learn more about this release and what it provides for the Spring community.
Syer described Spring Batch as a framework which manages batch and offline processing concerns so that an application developer can focus on business logic. Syer identified two new ideas that Spring Batch brings to the batch processing space -- the ability to write lightweight application code that can be tested in isolation, and a powerful framework to execute, manage and monitor the results of offline processing.
Syer identified the major features of this release as:
- Infrastructure - This provides reusable, low-level support for repeat and retry, transaction synchronization, reading to and writing from flat files, XML and databases
- Core - This is a thin API which allows launching and management of batch jobs, and provides all features which are needed operationally
- Execution - This is an implementation of the core API, which provides a runtime/execution environment for batch processing
- Comprehensive set of sample applications - Spring Batch 1.0 provides a samples module which contains several example applications which show all of the main features of Spring Batch in operation
Syer also indicated that a detailed explanation of all of the features in version 1.0 was available.
One of the things which sets Spring Batch apart from other projects in the Spring Portfolio is that Spring Batch is the result of a partnership between SpringSource and Accenture. Syer told InfoQ that the partnership has resulted in a significant benefit in terms of the number and depth of contributions to the codebase, and that the partnership had been "extremely successful" with Accenture having assigned some of their best resources to the project. Syer was also careful to note that Spring Batch is run in the same manner and held to the same standard as every project in the Spring Portfolio.
Spring Batch has had a relatively long development process, with dates as early as January 2007 noted in the Subversion repository. Syer explained the reasons for this:
Of course it would have been nice to get to a final release a bit quicker, but actually we always had a plan with a hard stop in March 2008 to coincide with the release of the rest of the Spring Portfolio and the availability of the SpringSource subscription products. We have to be very careful with our public API design, in particular so that we do not need to make changes where we have plans for future development. General product quality is also a major driver - we are perfectionists.
That being said, the maturity and richness of functionality can be shown by the handful of projects that have gone into production using milestone releases.
When asked to describe some of these production projects in more detail, Syer said:
- A large European health care provider has migrated a number of their mainframe batch processing to Spring Batch as part of an overall Application Renewal project. This is quite a common pattern and request due to the fact that the skills available in today's job market make it easy to find good Spring developers, and more difficult to find good COBOL programmers. This client uses Spring Batch XML streaming and mapping capabilities, along with Hibernate for business object persistence where they are able to get some level of re-use from the work done as part of their online processing
- A large sports organisation updates scores and statistics live as games happen for real-time tracking by users. They developed a system where file reading only required configuration, allowing for fast development. The modular approach also allowed for jobs to be run faster, launched every 5 seconds
- A large state government in the US has an IT renewal project replacing mainframe batch jobs with Java. The objective is to process unemployment claims. Challenges here include legacy mainframe and government-specific data formats, and strict rules about partial job failures. In this case, the batch job development is only piece of a much larger programme which is ahead of schedule
Syer also listed the three categories that implementing applications tend to fall into:
- Close-of-business processing such as reporting, order processing, and account reconciliation
- Import and Export handling such as form processing, inventory import, and allocation export
- Large-scale output jobs such as email campaigns and financial statements
When asked about plans for the future of Spring Batch, Syer said:
We are providing an excellent platform for single process (possibly multi-threaded) execution in 1.0. The future is full of possibilities for moving to numerous multi-process models on a variety of platforms, and we have been very careful to anticipate those changes in the 1.0 codebase. Building on the platform we have, which already provides most of the hooks and data structures we need, we are also going to be thinking very hard about the usability and deployability of batch applications. Monitoring and managing batch applications is very important in real-life production situations, and we see a number of ways we can add additional value in this area. Along with the rest of the Spring portfolio projects, we see OSGi as a key part of our strategy for the future (1.0 will be packaged as OSGi bundles, but that is really only the start).
Syer also thanked all those who had contributed to Spring Batch through forums, bug reports, discussions and code, describing both the quality and amount of feedback as "truly impressive".
Spring Batch Webinar