In this article, author Carlos Bueno describes a method for analyzing constraints on the shape and flow of data in systems. He talks about the factors useful for system analysis like working set & average transaction sizes, request & update rates, consistency, locality, computation, and latency. He also discusses big data architecture details of two use cases, movie streaming and face recognition.
Spark SQL, part of Apache Spark big data framework, is used for structured data processing and allows running SQL like queries on Spark data. In this article, Srini Penchikala discusses Spark SQL module and how it simplifies running data analytics using SQL interface. He also talks about the new features in Spark SQL, like DataFrames and JDBC data sources.
Synchronization of data across systems is expensive and impractical when running systems at scale. Traditional approaches for performing computations or information dissemination are not viable. In this article Basho Sr. Software Engineer Chris Meiklejohn explores the basic building blocks for crafting deterministic applications that guarantee convergence of data without synchronization.
Apache Spark is an open source big data framework built around speed, ease of use, and sophisticated analytics. In this article, Srini Penchikala discusses how Spark helps with big data processing. 2
GridGain announced that the In-Memory Data Fabric has been accepted into Apache Incubator program as Apache Ignite. InfoQ spoke with Nikita Ivanov about their product becoming part of Apache.
The new “Hadoop in Practice. 2 Edition" book by Alex Holmes covers a lot of topics building Hadoop code and organizing data to support code simplicity and execution speed.
Datameer, a big data analytics application for Hadoop, introduced Datameer 5.0 with Smart Execution to enhance the data analytics. InfoQ spoke with Matt Schumpert from Datameer about the new product.
The article describes the general outline of the Stats Anomalies Detector developed at MyHeritage and provides a detailed explanation of how to enhance the code to meet your company’s needs.
"Analytics Across the Enterprise" book is a collection of experiences by analytics practitioners in IBM. InfoQ spoke with authors about lessons learned and IBM technologies in the Big Data area.
This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), and what technologies and products you can choose from. 6
GridGain announced In-Memory Accelerator for Hadoop, offering benefits of in-memory computing to Hadoop applications. InfoQ spoke with Nikita Ivanov from GridGain about the product's architecture.
Spring XD (eXtreme Data) is Pivotal’s Big Data play. It joins Spring Boot and Grails as part of the execution portion of the Spring IO platform. 1