InfoQ Homepage Database Content on InfoQ
-
Time-Series Storage: Design Choices That Shape Cost and Performance
Every time-series database makes a set of storage design decisions: how to lay out rows, when to compress, what to partition on. These decisions determine cost and query performance more than the choice of database itself. This article works through those fundamentals from first principles, using widely available tools like PostgreSQL and Apache Parquet to make each trade-off measurable.
-
From Batch to Micro-Batch Streaming: Lessons Learned the Hard Way in a Delta Index Pipeline
This article describes how a production delta-index pipeline migrated from scheduled batch to micro-batch Spark Structured Streaming. It covers why record-level streaming was rejected, how partition-based watermarks replaced fragile S3 completion markers, overlap-window correctness, and restart-as-design strategies for better predictability in object-store–based ingestion systems.
-
Lakehouse Tower of Babel: Handling Identifier Resolution Rules across Database Engines
Lakehouse architectures enable multiple engines to operate on shared data using open table formats such as Apache Iceberg. However, differences in SQL identifier resolution and catalog naming rules create interoperability failures. This article examines these behaviors and explains why enforcing consistent naming conventions and cross-engine validation is critical.
-
Replacing Database Sequences at Scale without Breaking 100+ Services
The article discusses the challenges faced during a migration from a relational database to NoSQL, focusing on the importance of database sequences for unique identifiers. It outlines the development of a new sequence service using DynamoDB and a two-tier caching architecture.
-
Jakarta EE 12 Milestone 2: Advent of the Data Age along with Consistency and Configuration
Jakarta EE 12 Milestone 2 marks the beginning of the next generation of enterprise Java. It introduces Jakarta Query, a unified query language across Persistence, Data, and NoSQL, while aligning the platform with Java 21. This milestone focuses on integration, modernization, and improving developer productivity for cloud-native enterprise applications.
-
Autonomous Big Data Optimization: Multi-Agent Reinforcement Learning to Achieve Self-Tuning Apache Spark
This article introduces a reinforcement learning (RL) approach grounded in Apache Spark that enables distributed computing systems to learn optimal configurations autonomously, much like an apprentice engineer who learns by doing. The author also implements a lightweight agent as a driver-side component that uses RL to choose configuration settings before a job runs.
-
NextGen Search - Where AI Meets OpenSearch through MCP
In this article, authors Srikanth Daggumalli and Arun Lakshmanan discuss next-generation context-aware conversational search using OpenSearch and AI agents powered by Large Language Models (LLMs) and Model Context Protocol (MCP).
-
Reducing False Positives in Retrieval-Augmented Generation (RAG) Semantic Caching: a Banking Case Study
In this article, author Elakkiya Daivam discusses why Retrieval Augmented Generation (RAG) and semantic caching techniques are powerful levers for reducing false positives in AI powered applications. She shares the insights from a production-grade evaluation with 1,000 query variations tested across seven bi-encoder models.
-
Building a RAG Application with Spring Boot, Spring AI, MongoDB Atlas Vector Search, and OpenAI
The RAG paradigm redefines AI: it combines generative models and business data for accurate, contextualised responses. The article shows how to integrate Spring Boot, Spring AI, MongoDB Atlas and OpenAI into a powerful and flexible pipeline capable of transforming the way businesses access and create value from data, with applications ranging from finance and healthcare to customer service.
-
InfoQ AI, ML and Data Engineering Trends Report - 2025
This InfoQ Trends Report offers readers a comprehensive overview of emerging trends and technologies in the areas of AI, ML, and Data Engineering. This report summarizes the InfoQ editorial team’s and external guests' view on the current trends in AI and ML technologies and what to look out for in the next 12 months.
-
Engineering a Time Series Database Using Open Source: Rebuilding InfluxDB 3 in Apache Arrow and Rust
At times, to evolve your product, you need to rebuild it from scratch. The article provides the story behind the rewrite of InfluxDB from scratch using a different programming language - Rust - and stack - Apache Flight, Data Fusion, Apache Arrow and Parquet (FDAP). It emphasises the benefits, as well as the mechanics behind its operation and the different versions of the product.
-
Jakarta EE 11 Overview: Virtual Threads, Records, and the Future of Persistence
Jakarta EE 11 delivers enhancements that include support for Java 17 and 21, integration with Java records and virtual threads, and the introduction of the Jakarta Data specification for unified SQL and NoSQL persistence. This release simplifies enterprise Java and establishes the groundwork for Jakarta EE 12, which will advance capabilities in data management.