InfoQ Homepage Database Content on InfoQ
-
Apache Kafka: Ten Best Practices to Optimize Your Deployment
Author Ben Bromhead discusses the latest Kafka best practices for developers to manage the data streaming platform more effectively. Best practices include log configuration, proper hardware usage, Zookeeper configuration, replication factor, and partition count.
-
Natural Language Processing with Java - Second Edition: Book Review and Interview
Natural Language Processing with Java - Second Edition book covers the Natural Language Processing (NLP) topic and various tools developers can use in their applications. Technologies discussed in the book include Apache OpenNLP and Stanford NLP. InfoQ spoke with co-author Richard Reese about the book and how NLP can be used in enterprise applications.
-
14 Things I Wish I’d Known When Starting with MongoDB
I’ve been a database person for an embarrassing length of time, but I only started working with MongoDB recently. When I was starting out with MongoDB, there are a few things that I wish I’d known about. With general experience, there will always be preconceptions of what databases are and what they do. In hopes of making it easier for other people, here is a list of common mistakes.
-
Democratizing Stream Processing with Apache Kafka® and KSQL - Part 2
In this article, author Robin Moffatt shows how to use Apache Kafka and KSQL to build data integration and processing applications with the help of an e-commerce sample application. Three use cases discussed: customer operations, operational dashboard, and ad-hoc analytics.
-
A Critique of Resizable Hash Tables: Riak Core & Random Slicing
This fall, Wallaroo Labs will be releasing a large new feature set to our distributed data stream processing framework, Wallaroo. One of the new features requires a size-adjustable, distributed data structure to support growing & shrinking of compute clusters. It might be a good idea to use a distributed hash table to support the new feature, but what distributed hash algorithm should we choose?
-
How to Choose a Stream Processor for Your App
Choosing a stream processor for your app can be challenging with many options to choose from. The best choice depends on individual use cases. In this article, the authors discuss a stream processor reference architecture, key features required by most streaming applications and optional features that can be selected based on specific use cases.
-
Analyzing and Preventing Unconscious Bias in Machine Learning
This article is based on Rachel Thomas’s keynote presentation, “Analyzing & Preventing Unconscious Bias in Machine Learning” at QCon.ai 2018. Thomas talks about the pitfalls and risk the bias in machine learning brings to the decision-making process. She discusses three use cases of machine learning bias.
-
Q&A on the Book Testing in the Digital Age
The Book Testing in the Digital Age by Tom van de Ven, Rik Marselis, and Humayun Shaukat, explains the impact that developments like robotics, artificial intelligence, internet of things, and big data are having in testing. It explores the challenges and possibilities that the digital age brings us when it comes to testing software systems.
-
Democratizing Stream Processing with Apache Kafka and KSQL - Part 1
In this article, author Michael Noll discusses the stream processing with KSQL, the streaming SQL engine for Apache Kafka. Topics covered include challenges of stateful stream processing and how KSQL addresses them, and how KSQL helps to bridge the world of streams and databases through streams and tables.
-
Picking an Active-Active Geo Distribution Strategy: Comparing Merge Replication and CRDT
Modern distributed applications are fuelling the growing demand for distributed active-active, multi-master databases. While most popular databases support multi-master deployment, different databases employ different techniques. LWW, MVCC, merge replication and CRDTs deliver eventual consistency, offering read and write access with local latency and remaining available during network partitions.
-
Columnar Databases and Vectorization
In this article, author Siddharth Teotia discusses the Dremio database which is based on Apache Arrow with vectorization capabilities.
-
Q&A on the Book Software Wasteland
Almost all Enterprise Information Systems now cost vastly more to implement than they should. When you have hundreds or thousands of complex applications, you are stuck in the Application Centric Quagmire. In the book Software Wasteland Dave McComb explores what is causing application development waste and how visualizing the cost of change and becoming data-centric can help to reduce the waste.