In the "Spark in Action" book, authors Petar Zecevic and Marko Bonaci discuss the Apache Spark framework for data processing (batch and streaming data use cases). They introduce the architecture of Spark and core concepts such as Resilient Distributed Datasets (RDDs). InfoQ spoke with them about Apache Spark, developer tools, and the upcoming features and enhancements in the future releases.
Current enterprise data architectures include NoSQL databases co-existing with relational databases. However, NoSQL data management currently lacks mature methods and tools to manage NoSQL data. In this article, author discusses a solution for managing both NoSQL and relational databases using Unified Data Modeling techniques.
Lana Gibson gave a talk at the AgileNZ conference on using analytics data to design web content, based on her experiences as Content Performance Lead working on the GOV.UK whole of government website
Java performance issues are often attributable to bad database access patterns. In this article a top performance field engineer demonstrates his patterns for diagnosing database related issues.
Our physical world is about to become digitally enabled and according to various predictions, there will be many billions of IoT devices going online and collecting data in the coming years. 1
In this article, third installment of Apache Spark series, author discusses Apache Spark Streaming framework for processing real-time streaming data using a log analytics sample application. 5
In this article, Dr. Josiah Carlson, author of the book “Redis in Action”, explains how to use Redis and sorted sets with hashes for time series analysis. 3
In this article, author discusses the survival prediction of colorectal cancer as a multi-class classification problem and how to solve that problem using the Apache Spark's MLlib Java API.
Data Lake-as-a-Service provides big data processing in the cloud for business outcomes in a cost effective way. InfoQ spoke with Lovan Chetty & Hannah Smalltree from Cazena about these solutions work.
Neo Technology, the company behind the graph NoSQL database Neo4j, recently released version 2.3 of the database and also announced openCypher initiative. InfoQ spoke with Philip Rathle about it.
In this article, author Dan Macklin discusses the transition to Riak NoSQL and Erlang based architecture coupled with Convergent Replicated Data Types (CRDTs) and lessons learned with the transition. 3
In this article, author discusses a bio-informatic software as a service (SaaS) product which was built as a public data warehousing and analytical platform for mass spectrometry data. 3