InfoQ Homepage Apache Iceberg Content on InfoQ

Articles

RSS Feed

Java

The Schema Proliferation Problem in Kafka and Flink Pipelines: How to Solve It

Schema proliferation builds slowly and gets expensive fast. One schema per event type feels right until there are ten tables, union queries spanning all of them, and a single field rename touching every schema. Discriminator-based schema consolidation collapses that to two tables, turning multi-table unions into a single query, while new variants are additive and don't break existing consumers.

Spoorthi Basu
on May 25, 2026
AI, ML & Data Engineering

Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines

Lakehouse architectures enable multiple engines to operate on shared data using open table formats such as Apache Iceberg. However, differences in SQL identifier resolution and catalog naming rules create interoperability failures. This article examines these behaviors and explains why enforcing consistent naming conventions and cross-engine validation is critical.

Maninder Parmar
on Apr 17, 2026
AI, ML & Data Engineering

Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations

Traditional data lakes are great for storing massive amounts of stuff, but they're terrible at the transactional guarantees and versioning that ML workloads desperately need. Apache Iceberg and SparkSQL bring database-like reliability to your data lake. Time travel, schema evolution, and ACID transactions help support reproducible machine learning experiments.

Anant Kumar
on Jul 31, 2025

Unlock the full InfoQ experience

Don't have an InfoQ account?

Topics

Million PDFs: Building a Modern Document Infrastructure with Rust and Typst

Architectural Patterns: Moving Beyond Cloud-Native to Local-First - Insights from Adam Wiggins

Trustworthy Productivity: Securing AI-Accelerated Development

Craig McLuckie on Culture as a Team's Operating System in the AI Era

The Time it Wasn't DNS

Helpful links

Choose your language

Articles

The Schema Proliferation Problem in Kafka and Flink Pipelines: How to Solve It

Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines

Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations

Million PDFs: Building a Modern Document Infrastructure with Rust and Typst

Rust at the Core - Accelerating Polyglot SDK Development

Anthropic Lead: HTML Increasingly Better Than Markdown at Keeping Humans Engaged in Agentic Loops

Inside Target’s LLM-Based System for Semantic Matching in Marketing Forecast Pipelines

Architectural Patterns: Moving Beyond Cloud-Native to Local-First - Insights from Adam Wiggins

Grab Builds Secure Agentic AI Workload Platform

Building a European Cloud Orchestration Platform within an Enterprise

How Lightweight ADRs and Architectural Advice Forums Can Support Architectural Decisions

Craig McLuckie on Culture as a Team's Operating System in the AI Era

Trustworthy Productivity: Securing AI-Accelerated Development

Elastic Open-Sources Atlas Agent Memory Based on Cognitive Science

Virtual panel: Security in the Machine Age: Expert Insights on AI Threat Evolution

Microsoft Brings AI-Powered Vulnerability Remediation to Azure DevOps with Copilot Autofix

AI Tools Accelerates Coding, but Not Overall Software Delivery, GitLab Research Finds

AWS Introduces Workload Credentials Provider for Automated Certificate and Secret Management

Online InfoQ AI Engineering Certification

Online InfoQ Architect Certification

Online InfoQ AI Security & Privacy Engineering Program

QCon San Francisco

QCon London 2027

InfoQ Software Architects' Newsletter

Articles