BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Guides Data Engineering Innovations eMag

Data Engineering Innovations eMag

Bookmarks

Today’s modern data architecture stacks look significantly different from the data architecture models from only a few years ago.

Data streaming and stream processing have become the core components of modern data architecture. Real time data streams are being managed as first-class citizens in data processing analytics solutions. Some companies are even shifting their architecture and technology thinking from “everything’s at rest” to “everything’s in motion.”

Change data capture (CDC) has become a critical design pattern in data engineering use cases. CDC can be used in event-driven microservices based applications, along with data streaming to implement robust solutions.

The emphasis on data streams  is also driving innovations in the data governance space such as the stream catalog and stream lineage.

Data mesh architecture, which has been getting a lot of attention recently, is built on four solid principles: domain ownership, data as a product, self-serve data infrastructure platform, and federated governance. Data mesh is expected to have a huge impact on the overall data management programs and initiatives in organizations.

Similar to many compute services in the cloud platforms, data storage services and databases now support serverless models where you only pay for what you use.

On the security and regulatory compliance side, data residency and data sovereignty  are getting a lot of attention to ensure the consumers’ data is protected and privacy is maintained throughout the life of the data.

Next-generation data engineering innovations will build on these recent trends to provide even more robust, secure, highly available and resilient data solutions to the development community.

In the InfoQ “Data Engineering Innovations” eMag, you’ll find up-to-date case studies and real-world data engineering solutions from technology SME’s and leading data practitioners in the industry.

Free download

The Data Engineering Innovations eMag include:

  • In-Process Analytical Data Management with DuckDB - DuckDB is an open-source OLAP database for analytical data management that operates as an in-process database, avoiding data transfer overhead. Leveraging vectorized query processing and Morsel-Driven parallelism, the database optimizes performances and multi-core utilization for analytical data processing.
  • Create Your Distributed Database on Kubernetes with Existing Monolithic Databases - The next challenge for databases is to run them on Kubernetes to become cloud neutral. However, they are more difficult to manage than the application layer, since Kubernetes is designed for stateless applications. Apache ShardingSphere is the ecosystem to transform any database into a distributed database system and enhance it with sharding, elastic scaling, encryption features, and more.
  • Design Pattern Proposal for Autoscaling Stateful Systems - In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.
  • DynamoDB Data Transformation Safety: from Manual Toil to Automated and Open Source - Data transformation remains a continuous challenge in engineering and built upon manual toil. The open source utility Dynamo Data Transform was built to simplify and build safety and guardrails into data transformation for DynamoDB based systems––built upon a robust manual framework that was then automated and open sourced. This article discusses the challenges with Data Transformation.
  • Understanding and Applying Correspondence Analysis - Customer segments, personality profiles, social classes, and age generations are examples of effective references to larger groups of people sharing similar characteristics. Correspondence analysis (CA) is a multivariate analysis technique that projects categorical data into a numeric feature space which captures most of the variability in the data by fewer dimensions.

InfoQ eMags are professionally designed, downloadable collections of popular InfoQ content - articles, interviews, presentations, and research - covering the latest software development technologies, trends, and topics.

BT