InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Embracing Cloud-Native for Apache DolphinScheduler with Kubernetes: a Case Study
This article shares how Apache DolphinScheduler was updated to use a more modern, cloud-native architecture. This includes moving to Kubernetes and integrating with Argo CD and Prometheus. This improves substantially the user experience of deploying, operating, and monitoring DolphinScheduler.
-
What You Should Know before Deploying ML in Production
What should you know before deploying machine learning projects to production? There are four aspects of Machine Learning Operations, or MLOps, that everyone should be aware of first. These can help data scientists and engineers overcome limitations in the machine learning lifecycle and actually see them as opportunities.
-
AI for Software Developers: a Future or a New Reality?
In this article, author Nikita Povarov discusses the role AI/ML plays in software development and how tasks like code completion, code search, and bug detection can be powered by machine learning. But he also explains why a complete replacement of programmers by algorithms isn't going happen any time soon.
-
Raft Engine: a Log-Structured Embedded Storage Engine for Multi-Raft Logs in TiKV
In this article, authors discuss the design and implementation of Raft Engine, a log-structured embedded storage engine introduced in TiDB distributed, NewSQL database version 5.4. They also discuss the performance benefits of the engine compared to the previous implementation based on RocksDB.
-
Building End-to-End Field Level Lineage for Modern Data Systems
In this article, the authors discuss the data lineage as a critical component of data pipeline root cause and impact analysis workflow, and how automating lineage creation and abstracting metadata to field-level helps with the root cause analysis efforts.
-
Using Machine Learning for Fast Test Feedback to Developers and Test Suite Optimization
Software testing, especially in large scale projects, is a time intensive process. Test suites may be computationally expensive, compete with each other for available hardware, or simply be so large as to cause considerable delay until their results are available. The article explores optimizing test execution, saving machine resources, and reducing feedback time to developers.
-
Federated Machine Learning and Edge Systems
At QCon Plus 2021, Katharine Jarmul spoke about machine learning on edge devices using federated machine learning. Some key takeaways were: federated machine learning is useful for edge devices with limited network bandwidth and can improve data privacy; and learning on edge devices can improve data diversity and allow for predictions even when the device is no longer connected.
-
The Major Software Industry Trends from 2021 and What to Watch in 2022
In this podcast summary Thomas Betts, Wes Reisz, Shane Hastie, Charles Humble, Srini Penchikala, and Daniel Bryant discuss what they have seen in 2021 and speculate a little on what they hope to see in 2022. Topics explored included: hybrid working and the importance of ethics and sustainability within technology.
-
The Next Evolution of the Database Sharding Architecture
In this article, author Juan Pan discusses the data sharding architecture patterns in a distributed database system. She explains how Apache ShardingSphere project solves the data sharding challenges. Also discussed are two practical examples of how to create a distributed database and an encrypted table with DistSQL.
-
Getting Rid of Wastes and Impediments in Software Development Using Data Science
This article presents how to use data science to detect wastes and impediments, and concepts and related information that help teams to figure out the root cause of impediments they struggle to get rid of. The knowledge discovered during research includes an expanded waste classification, and the use of trends to uncover undesired situations like hidden delayed backlog items and defects trends.
-
Developing Deep Learning Systems Using Institutional Incremental Learning
Institutional incremental learning promises to achieve collaborative learning. This form of learning can address data sharing and security issues, without bringing in the complexities of federated learning. This article talks about practical approaches which help in building an object detection system.
-
Anomaly Detection Using ML.NET
In this article, the author introduces the concepts of Anomaly Detection using the Randomized PCA method. The theory behind the concepts is explained and exemplified. The method is demonstrated with a real-world scenario implemented using C# and ML.NET.