SpringXD being Re-architected and Re-branded to Spring Cloud Data Flow

Pivotal announced a complete re-design of Spring XD, its big data offering, during last week’s SpringOne2GX conference, with a corresponding re-brand from Spring XD to Spring Cloud Data Flow. The new product uses executable applications as the foundation for modules, and focuses on the orchestration of them. Whilst at the top level the REST API, shell and UI have survived from Spring XD, maintaining backwards compatibility, below that the two products are very different.

Spring XD’s Zookeeper-based runtime is gone, replaced by a service provider interface or SPI that delegates to other systems such as Pivotal Cloud Foundry, Lattice, and Yarn for launching, scaling and monitoring microservice applications. So for example the SPI for Lattice uses the receptor API to launch modules. On Cloud Foundry the cloud controller API is used. There’s also a local implementation that runs in process, analogous to Single node in the old XD.

Spring XD 1.x to Spring Cloud Data Flow

“The essential idea is that we’ve kept a lot of the higher level APIs”, Pollack told the conference, “but underneath we’ve massively re-architected it to overcome basic limitations that we’ve found.”

These limitations included scaling capabilities, canary deployments, resource allocation such as different modules being given different memory allocation, distributed tracing, and so on, that the current architecture didn’t support. Other limitations were related to the use of a classic parent-child classloader hierarchy as opposed to a flat classloader which is possible in an isolated microservice application architecture.

To solve the class loader problem the existing integration and batch modules have been refactored to Spring Boot apps with isolated flat classloaders. In effect the redesign allows stream and batch applications to run as data microservices that can be independently evolved. The modules can be run without Spring Cloud Data Flow involved at all - java -jar will do the job - but the data flow layer takes away a lot of the tedious work of configuring properties and so on. Amongst other things it should be much more straightforward to write unit tests against the stand-alone modules than it was when they were run in Zookeeper-based XD Containers, and this may in turn kick-start a market for more community contributions to be developed.

Below the Boot modules are two new projects, Spring Cloud Stream and Spring Cloud Task, which have been created to provide auto-configuration capabilities for Spring Integration and Spring Batch respectively.

Modules to Microservices

To get some sense of the programming model the following code, from a second presentation given by Mark Fisher and Dave Syer, is an inbound channel adaptor (using a standard Spring Integration annotation) that by default will be called every second by Spring Integration:

@EnableBinding(Source.class)
public class Greeter {
	@InboundChannelAdapter(Source.OUTPUT)
	public String greet() {
		return "hello world";
	}
}

The @EnableBindings(Source.class) annotation will detect what binder implementation you have on your classpath, and then use that binder to create channel adapters. It is parameterized by an interface. Source, Sink and Processor are provided for you and you can also define others. In this case Source itself is just a message channel interface:

public interface Source {
  @Output("output")
  MessageChannel output();
}

The @Output annotation is used to identify output channels (messages leaving the module) and @Input is used to identify input channels (messages entering the module). It is optionally parameterized by a channel name - if the name is not provided the method name is used instead.

The corresponding Sink is a separate process, so we could have, for example,10 of these running. This is listening to another middleware integration channel and is activated when a message comes in :

@EnableBinding(Sink.class)
public class Logger {
	@ServiceActivator(inputChannel=Sink.INPUT)
	public void log(String message) {
		System.out.println(message);
	}	
}

Spring Cloud Data Flow is the glue that takes care of wiring these pieces together. A milestone release is currently available.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

Rate this Article

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter