InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Apache Kafka: Ten Best Practices to Optimize Your Deployment
Author Ben Bromhead discusses the latest Kafka best practices for developers to manage the data streaming platform more effectively. Best practices include log configuration, proper hardware usage, Zookeeper configuration, replication factor, and partition count.
-
Natural Language Processing with Java - Second Edition: Book Review and Interview
Natural Language Processing with Java - Second Edition book covers the Natural Language Processing (NLP) topic and various tools developers can use in their applications. Technologies discussed in the book include Apache OpenNLP and Stanford NLP. InfoQ spoke with co-author Richard Reese about the book and how NLP can be used in enterprise applications.
-
The 2018 InfoQ Editors’ Recommended Reading List: Part One
As part of our core values of sharing knowledge, the InfoQ editor team has listed and commented on their most recent recommended reading.
-
Democratizing Stream Processing with Apache Kafka® and KSQL - Part 2
In this article, author Robin Moffatt shows how to use Apache Kafka and KSQL to build data integration and processing applications with the help of an e-commerce sample application. Three use cases discussed: customer operations, operational dashboard, and ad-hoc analytics.
-
How to Choose a Stream Processor for Your App
Choosing a stream processor for your app can be challenging with many options to choose from. The best choice depends on individual use cases. In this article, the authors discuss a stream processor reference architecture, key features required by most streaming applications and optional features that can be selected based on specific use cases.
-
Analyzing and Preventing Unconscious Bias in Machine Learning
This article is based on Rachel Thomas’s keynote presentation, “Analyzing & Preventing Unconscious Bias in Machine Learning” at QCon.ai 2018. Thomas talks about the pitfalls and risk the bias in machine learning brings to the decision-making process. She discusses three use cases of machine learning bias.
-
Key Takeaway Points and Lessons Learned from QCon New York 2018
This year, at the seventh annual QCon New York, we had in total 143 speakers across the 117 sessions, workshops, AMAs, Open Spaces and mini-workshops. Topics included containers and orchestration, machine learning, ethics, modern user interfaces, microservices, blockchain, empowered teams, modern Java, DevEX, Serverless, chaos and resilience, Go, Rust, Elixir, and security.
-
Understanding Software System Behaviour with ML and Time Series Data
David Andrzejewski presented "Understanding Software System Behaviour with ML and Time Series Data". This article is a summary of his presentation and an overview on what to look out for. Know about the traditional approaches to time series, how to handle missing values, and know about possibly occurring seasonality in your data. Be careful about what threshold you set for anomaly detection.
-
Data Citizens: Why We All Care about Data Ethics
Data citizens are impacted by the models, methods, and algorithms created by data scientists, but they have limited agency to affect the tools which are acting on them.
-
Can People Trust the Automated Decisions Made by Algorithms?
The use of automated decision making is increasing. These algorithms can produce results that are incomprehensible, or socially undesirable. How can we determine the safety of algorithms in devices if we cannot understand them? Public fears about the inability to foresee adverse consequences has impeded technologies such as nuclear energy and genetically modified crops.
-
Democratizing Stream Processing with Apache Kafka and KSQL - Part 1
In this article, author Michael Noll discusses the stream processing with KSQL, the streaming SQL engine for Apache Kafka. Topics covered include challenges of stateful stream processing and how KSQL addresses them, and how KSQL helps to bridge the world of streams and databases through streams and tables.
-
Columnar Databases and Vectorization
In this article, author Siddharth Teotia discusses the Dremio database which is based on Apache Arrow with vectorization capabilities.