InfoQ Homepage Big Data Content on InfoQ

Articles

RSS Feed

Newer Older

Highly Distributed Computations Without Synchronization

Synchronization of data across systems is expensive and impractical when running systems at scale. Traditional approaches for performing computations or information dissemination are not viable. In this article Basho Sr. Software Engineer Chris Meiklejohn explores the basic building blocks for crafting deterministic applications that guarantee convergence of data without synchronization.

Christopher Meiklejohn
on Feb 17, 2015
AI, ML & Data Engineering

Big Data Processing with Apache Spark – Part 1: Introduction

Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics. In this article, Srini Penchikala talks about how Apache Spark framework helps with big data processing and analytics with its standard API. He also discusses how Spark compares with traditional MapReduce implementation like Apache Hadoop.

Srini Penchikala
on Jan 30, 2015
Apache Ignite GridGain Incubator Project - Q&A Interview with Nikita Ivanov

GridGain announced that the In-Memory Data Fabric has been accepted into Apache Incubator program as Apache Ignite. InfoQ spoke with Nikita Ivanov about their product becoming part of Apache.

Srini Penchikala
on Dec 03, 2014
Interview with Alex Holmes, author of “Hadoop in Practice. Second Edition”

The new “Hadoop in Practice. Second Edition” book by Alex Holmes provides a deep insight into Hadoop ecosystem covering a wide spectrum of topics such as data organization, layouts and serialization, data processing, including MapReduce and big data patterns, special structures along with their usage to simplify big data processing, and SQL on Hadoop data.

Boris Lublinsky
on Nov 20, 2014
Matt Schumpert on Datameer Smart Execution

Datameer, a big data analytics application for Hadoop, introduced Datameer 5.0 with Smart Execution to dynamically select the optimal compute framework at each step in the big data analytics process. InfoQ spoke with Matt Schumpert from Datameer team about the new product and how it works to help with big data analytics needs.

Srini Penchikala
on Nov 13, 2014
Stats Anomalies Detector

The article describes the general outline of the Stats Anomalies Detector we developed at MyHeritage and provides a detailed explanation of how to enhance the code (will be available soon at MyHeritage GitHub) to meet your company’s needs.

Yonatan Harel Ran Levy
on Nov 07, 2014
Analytics Across the Enterprise: How IBM Realizes Business Value from Big Data and Analytics

Analytics Across the Enterprise: How IBM Realizes Business Value from Big Data and Analytics book by Brenda L. Dietrich, Emily C. Plachy, and Maureen F. Norton is a collection of experiences by analytics practitioners in IBM. InfoQ spoke with the authors about the lessons learned from the book, the arsenal of technologies IBM has about Big Data and the future of Analytics.

Alex Giamas
on Oct 27, 2014
AI, ML & Data Engineering

Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse

This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.

Kai Wähner
on Sep 10, 2014
Nikita Ivanov on GridGain’s In-Memory Accelerator for Hadoop

GridGain recently announced the In-Memory Accelerator for Hadoop, offering the benefits of in-memory computing to Hadoop based applications. It includes two components: an in-memory file system and a MapReduce implementation. InfoQ spoke with Nikita Ivanov, CTO of GridGain about the architecture of the product.

Srini Penchikala
on Sep 08, 2014
Java

Introducing Spring XD, a Runtime Environment for Big Data Applications

Spring XD (eXtreme Data) is Pivotal’s Big Data play. It joins Spring Boot and Grails as part of the execution portion of the Spring IO platform. Whilst Spring XD makes use of a number of existing Spring projects it is a runtime environment rather than a library or framework, comprising a bin directory with servers that you start up and interact with via a shell.

Charles Humble
on Jul 23, 2014
MLConf NYC 2014 Highlights

The MLConf conference was going strong in NYC on April 11th and was a full day packed with talks around Machine Learning and Big Data, featuring speakers from many prominent companies.

Charles Menguy
on Apr 17, 2014
Lambda Architecture: Design Simpler, Resilient, Maintainable and Scalable Big Data Solutions

Lambda Architecture proposes a simpler, elegant paradigm designed to store and process large amounts of data. In this article, author Daniel Jebaraj presents the motivation behind the Lambda Architecture, reviews its structure with the help of a sample Java application.

Daniel Jebaraj
on Mar 12, 2014

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles