InfoQ Homepage Apache Iceberg Content on InfoQ
Articles
RSS Feed-
The Schema Proliferation Problem in Kafka and Flink Pipelines: How to Solve It
Schema proliferation builds slowly and gets expensive fast. One schema per event type feels right until there are ten tables, union queries spanning all of them, and a single field rename touching every schema. Discriminator-based schema consolidation collapses that to two tables, turning multi-table unions into a single query, while new variants are additive and don't break existing consumers.
-
Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines
Lakehouse architectures enable multiple engines to operate on shared data using open table formats such as Apache Iceberg. However, differences in SQL identifier resolution and catalog naming rules create interoperability failures. This article examines these behaviors and explains why enforcing consistent naming conventions and cross-engine validation is critical.
-
Building Reproducible ML Systems with Apache Iceberg and SparkSQL: Open Source Foundations
Traditional data lakes are great for storing massive amounts of stuff, but they're terrible at the transactional guarantees and versioning that ML workloads desperately need. Apache Iceberg and SparkSQL bring database-like reliability to your data lake. Time travel, schema evolution, and ACID transactions help support reproducible machine learning experiments.