In this article, author Carlos Bueno describes a method for analyzing constraints on the shape and flow of data in systems. He talks about the factors useful for system analysis like working set & average transaction sizes, request & update rates, consistency, locality, computation, and latency. He also discusses big data architecture details of two use cases, movie streaming and face recognition.
Spark SQL, part of Apache Spark big data framework, is used for structured data processing and allows running SQL like queries on Spark data. In this article, Srini Penchikala discusses Spark SQL module and how it simplifies running data analytics using SQL interface. He also talks about the new features in Spark SQL, like DataFrames and JDBC data sources.
Bulk data is commonly accessed via files & FTP. As the world moves toward APIs to facilitate collaboration, what are the requirements for data APIs? This article describes a meta-data driven architecture for bulk data ingestion. Two APIs operate in parallel to provide data changes as well as the data records themselves. An example demonstrates how API responses are parameterized using meta-data.
In this article, Basho Sr. Software Engineer Chris Meiklejohn explores the basic building blocks for crafting deterministic applications that guarantee convergence of data without synchronization. 1
The book "R for Everyone: Advanced Analytics and Graphics" authored by Jared Lander covers R language and how to use it for data analytics and visualizations. InfoQ spoke with Jared about the book.
Apache Spark is an open source big data framework built around speed, ease of use, and sophisticated analytics. In this article, Srini Penchikala discusses how Spark helps with big data processing. 3
This article shows how to use Amazon DynamoDB to create a Mars Rover application. You can use the same concepts described in this post to build your own web application. 1
When it comes to database change, agility through automation - the ability to rapidly to accelerate delivery – is what differentiates world-class enterprises from the rest of the crowd. 4
GridGain announced that the In-Memory Data Fabric has been accepted into Apache Incubator program as Apache Ignite. InfoQ spoke with Nikita Ivanov about their product becoming part of Apache.
Application Lifecycle Management has traditionally been difficult for databases. Ben Rees, explains why the road ahead is now clear for Database Lifecycle Management.
The new “Hadoop in Practice. 2 Edition" book by Alex Holmes covers a lot of topics building Hadoop code and organizing data to support code simplicity and execution speed.
Datameer, a big data analytics application for Hadoop, introduced Datameer 5.0 with Smart Execution to enhance the data analytics. InfoQ spoke with Matt Schumpert from Datameer about the new product.