BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Databricks Introduces Lakebase, a PostgreSQL Database for AI Workloads

Listen to this article -  0:00

Databricks has recently announced the general availability of Lakebase, a serverless, PostgreSQL-based OLTP database that scales compute and storage independently. Lakebase is designed to integrate with the Databricks platform, providing a hybrid solution that combines both transactional and analytical capabilities.

According to Databricks, the goal of the new serverless service is to simplify real-time apps and AI workloads by integrating database, analytics, and governance on a single platform. Lakebase provides instant data branching, point-in-time recovery, and unified access controls, designed to speed up development, improve reliability, and keep operational and analytical data in sync.

Arguing that operational databases were not designed for current AI-driven applications, Databricks claims that Lakebase is "a new category of operational database architecture with lightweight, ephemeral compute on top of durable, data lake storage." The team behind the project explains the challenge of traditional databases:

Because every query competes for the same fixed CPU and memory resources, a single query can affect all live operations. These constraints slow teams down and make it risky to work against live data. As applications become more automated and systems act on data in real time, this kind of shared, fragile infrastructure becomes an even bigger liability (...) To remove this architectural bottleneck, we created the lakebase category, a new architecture for operational databases that separates compute from storage.

Databricks Lakebase provides a managed Postgres database service integrated with the Databricks Data Intelligence Platform, providing automatic scaling, branching, and integration with Databricks services. Mostly known for its data analytics and AI platform built around Apache Spark, Databricks adds a new option to its popular Lakehouse solution. Matei Zaharia, CTO & co-founder of Databricks, writes on LinkedIn:

We believe this is going to make it radically simpler and more reliable to work with operational databases. You can instantly branch your database, take snapshots, roll back to a point in time, or create another copy for offline analysis, whether it's humans doing the operations or agents (...) All while keeping the standard Postgres interface and extensions.

The new managed option supports up to 8TB per instance and the latest Postgres 17, including pgvector for AI-driven search. Use cases highlighted in the GA announcement include real-time feature serving for machine learning, persistent memory for AI agents, and embedded analytics.

Lakebase has been in development since June 2025 and is built on technology Databricks acquired from the PostgreSQL company Neon. It was later strengthened by the acquisition of Mooncake last October, which improved the integration of PostgreSQL databases with lakehouse data.

Lakebase is now offered in two versions: Autoscaling and Provisioned. Autoscaling is the newer option and is where new features are being built, while Databricks continues to add the existing capabilities currently available in the Provisioned version. Jeremy Daly, co-founder of Ampt and AWS Serverless Hero, comments in his newsletter:

Databricks is turning some heads with its new Lakebase serverless database. Separating storage and compute isn't anything new, but using a Postgres interface to write directly to lakehouse storage in formats that Spark, Databricks SQL, and other analytics engines can immediately query without ETL is huge.

For the Autoscaling version, billing is usage-based, with charges calculated in Databricks Units (DBUs) based on the number of Capacity Unit hours consumed by the workload. Customers can set a minimum and maximum auto-scaling range, along with a "scale to zero" timeout. Storage is billed separately.

Lakebase is now generally available for production use on AWS, with Azure currently in public preview. Full Azure support is expected in the coming months, followed by Google Cloud later this year. According to the announcement, SOC2 and HIPAA certifications are planned for early 2026. High availability (readable secondaries) is currently available only in the Provisioned version.

 

About the Author

Rate this Article

Adoption
Style

BT