InfoQ Homepage AI, ML & Data Engineering Content on InfoQ

Articles

RSS Feed

Newer Older

AI, ML & Data Engineering

Streaming-First Infrastructure for Real-Time Machine Learning

This article covers the benefits of streaming-first infrastructure for two scenarios of real-time ML: online prediction, where a model can receive a request and make predictions as soon as the request arrives, and continual learning, when machine learning models are capable of continually adapting to change in data distributions in production.

Chip Huyen
on Aug 22, 2022
Architecture & Design

Creating a Secure Distributed Database Cluster Leveraging Your Existing Database Management System

The emergence of Big Data and data lakes doesn't necessarily mean the disappearance of the trusted relational database. The two can coexist, relational databases just need to adjust. For the transition we propose Database Plus, a new technology & concept applicable to any database, that answers these challenges and eliminates switching costs and vendor lock-in.

Trista Pan
on Aug 17, 2022
Java

Debezium and Quarkus: Change Data Capture Patterns to Avoid Dual-Writes Problems

It’s common in microservices to write data in two places, a database and then send the content to another microservice. One approach to tackle this problem is dual writes, but you may lose data because of concurrent writes. Debezium is an open-source project for change data capture using the log scanner approach to avoid dual writes and communicate persisted data correctly between services.

Alex Soto
on Aug 15, 2022
AI, ML & Data Engineering

AI, ML, and Data Engineering InfoQ Trends Report—August 2022

In this annual report, the InfoQ editors discuss the current state of AI, ML, and data engineering and what emerging trends you as a software engineer, architect, or data scientist should watch. We curate our discussions into a technology adoption curve with supporting commentary to help you understand how things are evolving.

Srini Penchikala Dr Einat Orr Rags Srinivas Roland Meertens Anthony Alford Daniel Dominguez
on Aug 02, 2022
.NET

Building Neural Networks with TensorFlow.NET

TensorFlow is an open-source framework developed by Google scientists and engineers for numerical computing. TensorFlow.NET is a library that provides a .NET Standard binding for TensorFlow. In this article, the author explains how to use Tensorflow.NET to build a neural network.

Robert Krzaczyński
on Jul 11, 2022
AI, ML & Data Engineering

API Friction Complicates Hunting for Cloud Vulnerabilities. SQL Makes it Simple

APIs can tell you everything about your cloud infrastructure, but they're hard to use and work in different ways. What if you could write simple SQL queries that call APIs for you and put results into a database? Steampipe, an open-source project that maps APIs to Postgres foreign tables, makes that dream come true. It's hard enough to reason over data. Acquiring it should be easy, and now it is.

Jon Udell
on Jul 06, 2022
DevOps

Embracing Cloud-Native for Apache DolphinScheduler with Kubernetes: a Case Study

This article shares how Apache DolphinScheduler was updated to use a more modern, cloud-native architecture. This includes moving to Kubernetes and integrating with Argo CD and Prometheus. This improves substantially the user experience of deploying, operating, and monitoring DolphinScheduler.

Yang Dian
on Jun 24, 2022
AI, ML & Data Engineering

What You Should Know before Deploying ML in Production

What should you know before deploying machine learning projects to production? There are four aspects of Machine Learning Operations, or MLOps, that everyone should be aware of first. These can help data scientists and engineers overcome limitations in the machine learning lifecycle and actually see them as opportunities.

Francesca Lazzeri
on Jun 09, 2022
AI, ML & Data Engineering

AI for Software Developers: a Future or a New Reality?

In this article, author Nikita Povarov discusses the role AI/ML plays in software development and how tasks like code completion, code search, and bug detection can be powered by machine learning. But he also explains why a complete replacement of programmers by algorithms isn't going happen any time soon.

Nikita Povarov
on May 20, 2022
AI, ML & Data Engineering

Raft Engine: a Log-Structured Embedded Storage Engine for Multi-Raft Logs in TiKV

In this article, authors discuss the design and implementation of Raft Engine, a log-structured embedded storage engine introduced in TiDB distributed, NewSQL database version 5.4. They also discuss the performance benefits of the engine compared to the previous implementation based on RocksDB.

Xinye Tao Chenhao Huang
on May 12, 2022
AI, ML & Data Engineering

Building End-to-End Field Level Lineage for Modern Data Systems

In this article, the authors discuss the data lineage as a critical component of data pipeline root cause and impact analysis workflow, and how automating lineage creation and abstracting metadata to field-level helps with the root cause analysis efforts.

Mei Tao Xuanzi Han Helena Muñoz
on Mar 03, 2022
Culture & Methods

Using Machine Learning for Fast Test Feedback to Developers and Test Suite Optimization

Software testing, especially in large scale projects, is a time intensive process. Test suites may be computationally expensive, compete with each other for available hardware, or simply be so large as to cause considerable delay until their results are available. The article explores optimizing test execution, saving machine resources, and reducing feedback time to developers.

Gregor Endler Marco Achtziger
on Feb 22, 2022

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles