InfoQ Homepage Big Data Content on InfoQ
-
IBM Launches Contest for Cognitive Mobile Apps using Watson
At the Mobile World Congress, IBM has announced a developer contest for developers to create mobile consumer and business apps powered by IBM Watson cognitive computing platform. The winners of the IBM Watson Mobile Developer Challenge will receive design consulting and support from IBM to gain access to the market.
-
Spark Officially Graduates From Apache Incubator
Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.
-
Elasticsearch 1.0.0 released
Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. The new version comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use.
-
Running Spark on R with SparkR
UC Berkeley’s AMPLab announced a developer preview of their new project SparkR to use Apache Spark natively from R.
-
Interactive SQL in Apache Hadoop with Impala and Hive
In the race for interactive SQL in Big Data environments, there are two open source based front-runners, Impala and Hive with the Stinger project. Cloudera recently announced that Impala is up to 69 times faster than Hive 0.12 and can outperform DBMS. Other than raw speed, we take a look at other considerations in choosing a SQL engine for Hadoop and also Tez, an application framework for YARN.
-
DataFu Enters Incubation Status at Apache
LinkedIn’s DataFu project, a collection of libraries for Hadoop, has now officially entered the incubation status at the Apache Software Foundation (ASF) since the first week of January.
-
Google Acquires Nest: Big Data Comes to Energy
Google has acquired Nest, maker of smart thermostat and smoke detectors, for $3.2 billion in cash, making it another major data source that will help Google understand how people live.
-
Spark, Storm and Real Time Analytics
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
-
Presto-as-a-Service: Interactive SQL Queries on AWS
Presto, a technology from Facebook enabling interactive SQL queries on petabytes of data, has now taken a first step into mainstream adoption. Big Data startup Qubole has launched its Presto-as-a-Service alpha with integration to Amazon Web Services.
-
Big Data: Do Languages Really Matter?
Big Data is a field where even a single millisecond loss can be significant over billions of events. Yet, languages often regarded as slow like Python have gained a lot of popularity in the past year. Recent articles and discussions in the Big Data community have started reigniting the debate around the choice of a programming language for data science and Big Data.
-
Big Data Revolution and Genomics Analysis
Curoverse and Tute Genomics secured $1.5 million each in seed funding in the past month aiming to bring gene sequencing to the masses. Illumina, Seven Bridges Genomics, Complete Genomics and others are offering researchers and private parties the opportunity to map the full genome sequence for a four figure quote. Illumina recently announced HiSeq X Ten, promising the long-awaited $1,000 genome.
-
Twitter Open-Sources its MapReduce Streaming Framework Summingbird
Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.
-
New Education Opportunities for Data Scientists
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
-
Carrier IQ's Magnolia Mansourkia Mobley Sets the Record Straight About Mobile Analytic Products
In 2011 Trevor Eckhart found logs on his device that he believed were associated with Carrier iQ data. Our response at the time, which has since been confirmed by a detailed FTC investigation, is that the data collection logs were associated with and used by the manufacturer of the device, not Carrier iQ. They were not Carrier iQ logs.
-
Trifacta Seeks to Simplify Data Wrangling-as-a-Service
Trifacta, a data analysis services platform, recently received VC investment to advance on their efforts of making data wrangling easier for data analysts. The goal is to collect, cleanse and munge data in a fraction of the time and effort it currently takes.