BT

InfoQ Homepage News High-Performance Data Processing with Spring Cloud Data Flow and Geode

High-Performance Data Processing with Spring Cloud Data Flow and Geode

Bookmarks

Cahlen Humphreys and Tiffany Chang spoke recently at the SpringOne Platform 2019 Conference about data processing with Spring Cloud Data Flow, Pivotal Cloud Cache and Apache Geode frameworks.

Humphreys talked about the difference between Spring Cloud Data Flow and Spring Cloud Stream frameworks. Spring Cloud Stream is the programming model used to develop applications to deploy with Spring Cloud Data Flow. It's easy to switch middleware components (Binders) when using Spring Cloud Stream without having to rewrite the application code.

He talked about what type of projects are good candidates to use Geode. If you have large volumes of data and need high throughput with low latency, then Geode may be a good choice for data processing. Apache Geode, which was open sourced in 2017, provides a database-like consistency model, reliable transaction processing and shared-nothing architecture.

Chang discussed horizontal scalability and how to configure the Geode cluster with Locators and Servers using Gemfire shell (gfsh) tool. For high availability, you should have at least three Locators configured. Geode data store cluster is scalable independent of the application scaling needs.

Geode supports fault tolerance using partitioned and replicated regions. The region is the core building block of Apache Geode cluster and all cached data is organized into data regions. The regions are part of the Cache which is the entry point to Geode data management.

For developers using SpringBoot, Geode offers several annotations out of the box to leverage the data caching, including @ClientCacheApplication, @Region, @EnableEntityDefinedRegions, @EnableGemfireRepositories, and @EnablePdx.

She also showed a demo application with a data pipeline using Apache Kafka, Geode, Prometheus, and Grafana technologies. The demo app used a local cluster with a minikube Kubernetes cluster, and deployed a pipeline that extracted data from a file source and enriched the payload with data from Geode.

The app, which is based on SpringBoot and Spring Geode Starter, also uses Micrometer to calcylate throughput and count metrics to send to the metrics server. The data pipeline architecture includes a Source, Processor and a Sink. The sample data pipeline uses Spring Cloud Stream which allows to easily switch between different messaging infrastructures like RabbitMQ or Kafka.

Chang showed some sample metrics from Geode versus a relational database like Postgres.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.