BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Debezium Releases Version 2.0 of Its Change Data Capture Tool

Debezium Releases Version 2.0 of Its Change Data Capture Tool

Bookmarks

Debezium, an open-source distributed platform for change data capture (CDC), converts records from existing databases into event streams, enabling applications to detect and respond to database row-level changes. The version 2.0 release introduces many changes, intending to make Debezium a "comprehensive open-source low-latency platform for (CDC)"

This latest release requires Java 11 and provides: incremental snapshots that are improved with stopping and pause/resume logic; transaction metadata has been enhanced with a new field, ts_ms, containing the transaction timestamp; and multi-tenancy databases are now supported out of the box.

Index handling has also been improved, and in the case that the primary key is not defined, Debezium may refer to columns such as CTID for PostgreSQL or ROWID in Oracle that are generated automatically by the database. There has also been the introduction of a new debezium-storage for file- and Kafka-based database history and offset storage.

Debezium 2.0 has been in development for the last three years since the previous version 1.0 was released in 2019. One of the main improvements in Debezium, initially introduced in version 1.6, is support for incremental snapshots. Normally, Debezium captures existing data in the snapshot phase executed once upon the first connector start-up. But the problems arise when it may be necessary to adjust the configuration and add tables that were not initially part of CDC.

With incremental snapshots, it is possible to use the signaling mechanism to send a snapshot signal and thus trigger a snapshot of just a set of tables. In version 2.0, Debezium added the capability to stop an ongoing snapshot, pause and resume it and also filter it with a SQL-based predicate to control what subset of records should be included in the incremental snapshot.

The image below shows the architecture of Debezium:

Debezium is built on top of Apache Kafka and provides a set of Kafka Connect compatible connectors to connect with different databases. In case of issues or crashes of the application that reads from Debezium, the changes are not missed since they are stored in a Kafka topic, and when the application is restored, it can resume reading from the point it left off.

Debezium is a log-based CDC and ensures that all data changes are captured. It provides a very low delay in change events, requires no changes to the data model, and can capture "delete" changes. Additional features are also provided such as snapshots- an initial snapshot of a database’s current state can be taken if a connector is started and not all logs still exist; filters- schema, tables, and columns can be included or excluded from CDC; masking- if a column contains sensitive data, it can be masked; and message transformations- ready to use transformations such as topic routing, content-based routing, and message filtering.

More details may be found in the Debezium 2.0 release notes and this roadmap for information on plans for future releases.

About the Author

Rate this Article

Adoption
Style

BT