InfoQ Homepage Database Content on InfoQ
-
Analytics Zoo: Unified Analytics + AI Platform for Distributed Tensorflow, and BigDL on Apache Spark
In this article we described how Analytics Zoo can help real-world users to build end-to-end deep learning pipelines for big data, including unified pipelines for distributed TensorFlow and Keras on Apache Spark, easy-to-use abstractions such as transfer learning and Spark ML pipeline support, built-in deep learning models and reference use cases, etc.
-
Back to the Future with Relational NoSQL
This article outlines some of the consistency issues NoSQL databases have with distributed transactions, showing how FaunaDB has solved the problems using the Calvin protocol and a virtual clock.
-
Sentiment Analysis: What's with the Tone?
Sentiment analysis is widely applied in voice of the customer (VOC) applications. In this article, the authors discuss NLP-based Sentiment Analysis based on machine learning (ML) and lexicon-based approaches using KNIME data analysis tools.
-
Spark Application Performance Monitoring Using Uber JVM Profiler, InfluxDB and Grafana
In this article, author Amit Baghel discusses how to monitor the performance of Apache Spark based applications using technologies like Uber JVM Profiler, InfluxDB database and Grafana data visualization tool.
-
How to Source Control Your Databases for DevOps
A robust DevOps environment requires having continuous integration for every component of the system. But far too often, the database is omitted from the equation. In this article, we discuss the unique aspects of databases, both relational and NoSQL, in a successful continuous integration environment.
-
Challenges of Building a Reliable Realtime Chat Service
Realtime chat has become a common feature of modern applications. These days not only communicators and social networks allow users to talk with each other over the Internet—chat is crucial in healthcare, e-commerce, gaming and many other industries.
-
Seth James Nielson on Blockchain Technology for Data Governance
Seth James Nielson recently hosted a tutorial workshop at Data Architecture Summit 2018 Conference about Blockchain technology and its impact on data architecture and data governance.
-
Apache Kafka: Ten Best Practices to Optimize Your Deployment
Author Ben Bromhead discusses the latest Kafka best practices for developers to manage the data streaming platform more effectively. Best practices include log configuration, proper hardware usage, Zookeeper configuration, replication factor, and partition count.
-
Natural Language Processing with Java - Second Edition: Book Review and Interview
Natural Language Processing with Java - Second Edition book covers the Natural Language Processing (NLP) topic and various tools developers can use in their applications. Technologies discussed in the book include Apache OpenNLP and Stanford NLP. InfoQ spoke with co-author Richard Reese about the book and how NLP can be used in enterprise applications.
-
14 Things I Wish I’d Known When Starting with MongoDB
I’ve been a database person for an embarrassing length of time, but I only started working with MongoDB recently. When I was starting out with MongoDB, there are a few things that I wish I’d known about. With general experience, there will always be preconceptions of what databases are and what they do. In hopes of making it easier for other people, here is a list of common mistakes.
-
Democratizing Stream Processing with Apache Kafka® and KSQL - Part 2
In this article, author Robin Moffatt shows how to use Apache Kafka and KSQL to build data integration and processing applications with the help of an e-commerce sample application. Three use cases discussed: customer operations, operational dashboard, and ad-hoc analytics.
-
A Critique of Resizable Hash Tables: Riak Core & Random Slicing
This fall, Wallaroo Labs will be releasing a large new feature set to our distributed data stream processing framework, Wallaroo. One of the new features requires a size-adjustable, distributed data structure to support growing & shrinking of compute clusters. It might be a good idea to use a distributed hash table to support the new feature, but what distributed hash algorithm should we choose?