InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Create Your Distributed Database on Kubernetes with Existing Monolithic Databases
The next challenge for databases is to run them on Kubernetes to become cloud neutral. However, they are more difficult to manage than the application layer, since Kubernetes is designed for stateless applications. Apache ShardingSphere is the ecosystem to transform any database into a distributed database system and enhance it with sharding, elastic scaling, encryption features, and more.
-
Apache DolphinScheduler in MLOps: Create Machine Learning Workflows Quickly
In this article, author discusses data pipeline and workflow scheduler Apache DolphinScheduler and how ML tasks are performed by Apache DolphinScheduler using Jupyter and MLflow components.
-
Migrating Netflix's Viewing History from Synchronous Request-Response to Async Events
In a web-based service, a slowdown in request processing can eventually make your service unavailable. Chances are, not all requests need to be processed right away. Some of them just need an acknowledgement of receipt. Have you ever asked yourself: “Would I benefit from asynchronous processing of requests? If so, how would I make such a change in a live, large-scale mission critical system?”
-
How to Migrate an Oracle Database to MySQL Using AWS Database Migration Service
Data migration efforts are typically taken up for database consolidation, cost considerations, or migrating on-prem databases to a cloud platform. In this article, author Deepak Vohra discusses the details of migrating a local database to MySQL database on the cloud, using AWS Database Migration Service.
-
AutoML: the Promise vs. Reality According to Practitioners
Automation to improve machine learning projects comes from a noble goal, but true end-to-end automation is not available yet. As a collection of tools, AutoML capabilities have proven value but need to be vetted more thoroughly. Findings from a qualitative study of AutoML users suggest the future of automation for ML and AI rests in the ability for us to realize the potential of AutoMLOps.
-
Business Systems Integration is about to Get a Whole Lot Easier
A new breed of integration software is arising that syncs business data into a simplified data hub and then syncs that data to the destination system. The benefit of this integration pattern is that it reduces the number of manual transformations required (often to zero) and makes it easier to write manual transformations when you have to.
-
Streaming-First Infrastructure for Real-Time Machine Learning
This article covers the benefits of streaming-first infrastructure for two scenarios of real-time ML: online prediction, where a model can receive a request and make predictions as soon as the request arrives, and continual learning, when machine learning models are capable of continually adapting to change in data distributions in production.
-
Creating a Secure Distributed Database Cluster Leveraging Your Existing Database Management System
The emergence of Big Data and data lakes doesn't necessarily mean the disappearance of the trusted relational database. The two can coexist, relational databases just need to adjust. For the transition we propose Database Plus, a new technology & concept applicable to any database, that answers these challenges and eliminates switching costs and vendor lock-in.
-
Debezium and Quarkus: Change Data Capture Patterns to Avoid Dual-Writes Problems
It’s common in microservices to write data in two places, a database and then send the content to another microservice. One approach to tackle this problem is dual writes, but you may lose data because of concurrent writes. Debezium is an open-source project for change data capture using the log scanner approach to avoid dual writes and communicate persisted data correctly between services.
-
AI, ML, and Data Engineering InfoQ Trends Report—August 2022
In this annual report, the InfoQ editors discuss the current state of AI, ML, and data engineering and what emerging trends you as a software engineer, architect, or data scientist should watch. We curate our discussions into a technology adoption curve with supporting commentary to help you understand how things are evolving.
-
Building Neural Networks with TensorFlow.NET
TensorFlow is an open-source framework developed by Google scientists and engineers for numerical computing. TensorFlow.NET is a library that provides a .NET Standard binding for TensorFlow. In this article, the author explains how to use Tensorflow.NET to build a neural network.
-
API Friction Complicates Hunting for Cloud Vulnerabilities. SQL Makes it Simple
APIs can tell you everything about your cloud infrastructure, but they're hard to use and work in different ways. What if you could write simple SQL queries that call APIs for you and put results into a database? Steampipe, an open-source project that maps APIs to Postgres foreign tables, makes that dream come true. It's hard enough to reason over data. Acquiring it should be easy, and now it is.