InfoQ Homepage Data Pipelines Content on InfoQ

Articles

RSS Feed

AI, ML & Data Engineering

Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

In this article, author Vignesh Durai discusses how agentic and multimodal AI systems can be engineered using Apache Camel and LangChain4j technologies. The key components in the solution include LLM-based reasoning, retrieval-augmented generation (RAG), and image classification.

Vignesh Durai
on Apr 24, 2026
Architecture & Design

The End of the Bronze Age: Rethinking the Medallion Architecture

A shift left approach to data processing relies on data products that form the basis of data communication across the business. This addresses many flaws in traditional data processing and makes data more relevant, complete, and trustworthy.

Adam Bellemare
on Jan 29, 2025
AI, ML & Data Engineering

Apache DolphinScheduler in MLOps: Create Machine Learning Workflows Quickly

In this article, author discusses data pipeline and workflow scheduler Apache DolphinScheduler and how ML tasks are performed by Apache DolphinScheduler using Jupyter and MLflow components.

Zhou Jieguang
on Oct 14, 2022
AI, ML & Data Engineering

Building End-to-End Field Level Lineage for Modern Data Systems

In this article, the authors discuss the data lineage as a critical component of data pipeline root cause and impact analysis workflow, and how automating lineage creation and abstracting metadata to field-level helps with the root cause analysis efforts.

Mei Tao Xuanzi Han Helena Muñoz
on Mar 03, 2022
Java

Implementing Pipeline Microservicilities with Tekton

Microservicilities is a list of cross-cutting concerns that a service must implement apart from the business logic. These concerns include invocation, elasticity and resiliency, among others. This article describes how a service mesh such as Istio may be used to implement these concerns.

Alex Soto
on Jul 15, 2021
AI, ML & Data Engineering

Accelerating Deep Learning on the JVM with Apache Spark and NVIDIA GPUs

In this article, authors discuss how to use the combination of Deep Java Learning (DJL), Apache Spark v3, and NVIDIA GPU computing to simplify deep learning pipelines while improving performance and reducing costs. They also show the performance comparison of this solution with GPU vs CPU hardware, using Amazon EMR and NVIDIA RAPIDS Accelerator.

Haoxuan Wang Qing Lan Carol McDonald
on Jun 11, 2021
AI, ML & Data Engineering

The Future of Data Engineering

Chris Riccomini examines the current and future states of the art in data pipelines, data streaming, and data warehousing. He presents a six-stage evolution that data ecosystems follow, from a simple monolith to a complex data-microwarehouse architecture as the data engineers who manage them solve problems and clarify their roles as infrastructure engineers, rather than data stewards.

Chris Riccomini
on Feb 16, 2021
AI, ML & Data Engineering

Scalable Cloud Environment for Distributed Data Pipelines with Apache Airflow

In this article, author Lena Hall discusses how to use Apache Airflow to define and execute distributed data pipelines with an example of the workflow framework running on Kubernetes on Azure cloud platform.

Lena Hall
on Sep 23, 2020
AI, ML & Data Engineering

Rethinking Flink’s APIs for a Unified Data Processing Framework

Since its very early days, Apache Flink has followed the philosophy of taking a unified approach to batch and streaming. The core building block is the “continuous processing of unbounded data streams, with batch as a special, bounded set of those streams.” Recent updates to the Flink APIs include architectural designs by the community to support batch and streaming unification in Apache Flink.

Aljoscha Krettek
on Sep 16, 2019

Unlock the full InfoQ experience

Don't have an InfoQ account?

Topics

From VR to Flat Screens: Bridging the Input and Immersion Gap

Stripe’s Docdb: How Zero-Downtime Data Movement Powers Trillion-Dollar Payment Processing

Leadership in AI-Assisted Engineering

The AI Joy Gap: Why Some Developers Thrive While Others Struggle

Evolution of a Backend for a Streaming Application

Helpful links

Choose your language

Articles

Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel

The End of the Bronze Age: Rethinking the Medallion Architecture

Apache DolphinScheduler in MLOps: Create Machine Learning Workflows Quickly

Building End-to-End Field Level Lineage for Modern Data Systems

Implementing Pipeline Microservicilities with Tekton

Accelerating Deep Learning on the JVM with Apache Spark and NVIDIA GPUs

The Future of Data Engineering

Scalable Cloud Environment for Distributed Data Pipelines with Apache Airflow

Rethinking Flink’s APIs for a Unified Data Processing Framework

Attacker Bought 30 WordPress Plugins on Flippa and Backdoored All of Them

Cloudflare Introduces Flagship: an Edge-Native Feature Flag Service Built on OpenFeature

QCon San Francisco 2026: 12 Tracks Announced

Netflix Serves 84% of Query Results from Cache with Interval-Aware Caching in Apache Druid

How GitHub Is Securing Agentic Workflows in Modern CI CD Systems

OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows

The AI Joy Gap: Why Some Developers Thrive While Others Struggle

Applying Best Simple System for Now for Software Design

The Human Scalability Problem: Why Your Teams Don’t Scale Like Your Code

Coder Agents Enable Running AI Coding Workflows on Self-Hosted Infrastructure

Netflix Introduces ‘Model Lifecycle Graph’ to Scale Enterprise Machine Learning

MySQL 9.7: First Major LTS Since 8.4 Brings Enterprise Features to Community Edition

Evolution of a Backend for a Streaming Application

New DORA Report Claims Strong Engineering Foundations Drive AI Return on Investment

Cloudflare Launches “Artifacts” Beta, Introducing Git-Like Versioning for AI Agents

QCon AI Boston

Online InfoQ Architect Certification

Online InfoQ AI Engineering Certification

QCon San Francisco

InfoQ Software Architects' Newsletter

Articles