BT

DataFu Enters Incubation Status at Apache

by Charles Menguy on  Feb 04, 2014

LinkedIn’s DataFu project, a collection of libraries for Hadoop, has now officially entered the incubation status at the Apache Software Foundation (ASF) since the first week of January.

Google Acquires Nest: Big Data Comes to Energy

by Michael Hausenblas on  Feb 04, 2014

Google has acquired Nest, maker of smart thermostat and smoke detectors, for $3.2 billion in cash, making it another major data source that will help Google understand how people live.

Spark, Storm and Real Time Analytics

by Alex Giamas on  Jan 31, 2014

Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.

Presto-as-a-Service: Interactive SQL Queries on AWS

by Charles Menguy on  Jan 24, 2014

Presto, a technology from Facebook enabling interactive SQL queries on petabytes of data, has now taken a first step into mainstream adoption. Big Data startup Qubole has launched its Presto-as-a-Service alpha with integration to Amazon Web Services.

Big Data: Do Languages Really Matter?

by Charles Menguy on  Jan 20, 2014 1

Big Data is a field where even a single millisecond loss can be significant over billions of events. Yet, languages often regarded as slow like Python have gained a lot of popularity in the past year. Recent articles and discussions in the Big Data community have started reigniting the debate around the choice of a programming language for data science and Big Data.

Big Data Revolution and Genomics Analysis

by Alex Giamas on  Jan 17, 2014

Curoverse and Tute Genomics secured $1.5 million each in seed funding in the past month aiming to bring gene sequencing to the masses. Illumina, Seven Bridges Genomics, Complete Genomics and others are offering researchers and private parties the opportunity to map the full genome sequence for a four figure quote. Illumina recently announced HiSeq X Ten, promising the long-awaited $1,000 genome.

Twitter Open-Sources its MapReduce Streaming Framework Summingbird

by Michael Hausenblas on  Jan 16, 2014

Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.

New Education Opportunities for Data Scientists

by Charles Menguy on  Jan 14, 2014

2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.

Carrier IQ's Magnolia Mansourkia Mobley Sets the Record Straight About Mobile Analytic Products

by Martin Monroe on  Jan 09, 2014

In 2011 Trevor Eckhart found logs on his device that he believed were associated with Carrier iQ data. Our response at the time, which has since been confirmed by a detailed FTC investigation, is that the data collection logs were associated with and used by the manufacturer of the device, not Carrier iQ. They were not Carrier iQ logs.

Trifacta Seeks to Simplify Data Wrangling-as-a-Service

by Alex Giamas on  Dec 30, 2013

Trifacta, a data analysis services platform, recently received VC investment to advance on their efforts of making data wrangling easier for data analysts. The goal is to collect, cleanse and munge data in a fraction of the time and effort it currently takes.

Hadoop-as-a-Service Provider Qubole Now Runs on Google Compute Engine

by Michael Hausenblas on  Dec 28, 2013

Qubole, a managed Hadoop-as-a-Service offering is now available on Google Compute Engine (GCE). Qubole was so far only available on Amazon's AWS and this announcement follows only a few days after Google releasing GCE into general availability.

Hadoop Jobs on GPU with ParallelX

by Charles Menguy on  Dec 26, 2013 1

The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.

Elastic Mesos service automates Mesos cluster deployment in EC2

by Charles Menguy on  Dec 17, 2013

EC2 users can now automate the deployment of Apache Mesos, an open-source tool to share cluster resources between multiple data processing frameworks, at scale through a new web service called Elastic Mesos provided by Big Data startup Mesosphere.

Martin Fowler on Data Austerity

by Jonathan Allen on  Dec 17, 2013

Martin Fowler writes about the opposite of Big Data, Datensparsamkeit. This German word roughly translates to “data austerity” or simply “not storing more than you need”.

A Survey and Interview on How Hadoop Is Used Today

by Boris Lublinsky on  Dec 12, 2013

This post presents the results of a Hortonworks survey of over 500 Hadoop Summit 2013 attendees on how they use Hadoop, and an interview with David McJannet on Hadoop trends today.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT