InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Understanding and Applying Correspondence Analysis
Customer segments, personality profiles, social classes, and age generations are examples of effective references to larger groups of people sharing similar characteristics. Correspondence analysis (CA) is a multivariate analysis technique that projects categorical data into a numeric feature space which captures most of the variability in the data by fewer dimensions.
-
How I Contributed as a Tester to a Machine Learning System: Opportunities, Challenges and Learnings
Have you ever wondered about systems based on machine learning? In those cases, testing takes a backseat. And even if testing is done, it’s done mostly by developers themselves. A tester’s role is not clearly portrayed. Testers usually struggle to understand ML-based systems and explore what contributions they can make. This is a journey of assuring quality of ML-based systems as a tester.
-
Understanding and Debugging Deep Learning Models: Exploring AI Interpretability Methods
ML interpretability refers to a user's ability to explain decisions made by an ML system. Interpretability increases confidence in the model, reduces bias, and ensures that model is compliant and ethical. In this article, author Andrew Hoblitzell discusses several methods of ML interpretability and dives deep into Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Values.
-
Design Pattern Proposal for Autoscaling Stateful Systems
In this article, Rogerio Robetti discusses the challenges in auto-scaling stateful storage systems and proposes an opinionated design solution to automatically scale up (vertical) and scale out (horizontal) from a single node up to several nodes in a cluster with minimum configuration and interference of the operator.
-
InfoQ Software Trends Report: Major Trends in 2022 and What to Watch for in 2023
2022 was another year of significant technological innovations and trends in the software industry and communities. The InfoQ podcast co-hosts met last month to discuss the major trends from 2022, and what to watch for in 2023. This article is a summary of the 2022 software trends podcast.
-
DynamoDB Data Transformation Safety: from Manual Toil to Automated and Open Source
Data transformation remains a continuous challenge in engineering and built upon manual toil. The open source utility Dynamo Data Transform was built to simplify and build safety and guardrails into data transformation for DynamoDB based systems––built upon a robust manual framework that was then automated and open sourced. This article discusses the challenges with Data Transformation.
-
Create Your Distributed Database on Kubernetes with Existing Monolithic Databases
The next challenge for databases is to run them on Kubernetes to become cloud neutral. However, they are more difficult to manage than the application layer, since Kubernetes is designed for stateless applications. Apache ShardingSphere is the ecosystem to transform any database into a distributed database system and enhance it with sharding, elastic scaling, encryption features, and more.
-
Apache DolphinScheduler in MLOps: Create Machine Learning Workflows Quickly
In this article, author discusses data pipeline and workflow scheduler Apache DolphinScheduler and how ML tasks are performed by Apache DolphinScheduler using Jupyter and MLflow components.
-
Migrating Netflix's Viewing History from Synchronous Request-Response to Async Events
In a web-based service, a slowdown in request processing can eventually make your service unavailable. Chances are, not all requests need to be processed right away. Some of them just need an acknowledgement of receipt. Have you ever asked yourself: “Would I benefit from asynchronous processing of requests? If so, how would I make such a change in a live, large-scale mission critical system?”
-
How to Migrate an Oracle Database to MySQL Using AWS Database Migration Service
Data migration efforts are typically taken up for database consolidation, cost considerations, or migrating on-prem databases to a cloud platform. In this article, author Deepak Vohra discusses the details of migrating a local database to MySQL database on the cloud, using AWS Database Migration Service.
-
AutoML: the Promise vs. Reality According to Practitioners
Automation to improve machine learning projects comes from a noble goal, but true end-to-end automation is not available yet. As a collection of tools, AutoML capabilities have proven value but need to be vetted more thoroughly. Findings from a qualitative study of AutoML users suggest the future of automation for ML and AI rests in the ability for us to realize the potential of AutoMLOps.
-
Business Systems Integration is about to Get a Whole Lot Easier
A new breed of integration software is arising that syncs business data into a simplified data hub and then syncs that data to the destination system. The benefit of this integration pattern is that it reduces the number of manual transformations required (often to zero) and makes it easier to write manual transformations when you have to.