InfoQ Homepage AI, ML & Data Engineering Content on InfoQ
-
Twitter Open-Sources its MapReduce Streaming Framework Summingbird
Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.
-
New Education Opportunities for Data Scientists
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
-
Carrier IQ's Magnolia Mansourkia Mobley Sets the Record Straight About Mobile Analytic Products
In 2011 Trevor Eckhart found logs on his device that he believed were associated with Carrier iQ data. Our response at the time, which has since been confirmed by a detailed FTC investigation, is that the data collection logs were associated with and used by the manufacturer of the device, not Carrier iQ. They were not Carrier iQ logs.
-
ORM Tool Hibernate 4.3 Released, Implementing JPA 2.1 Specification
The final version of the Object-Relational Mapping, ORM framework Hibernate 4.3 was recently released and is now a certified implementation of the JPA 2.1 specification, (JSR 338), released in May 2013.
-
Trifacta Seeks to Simplify Data Wrangling-as-a-Service
Trifacta, a data analysis services platform, recently received VC investment to advance on their efforts of making data wrangling easier for data analysts. The goal is to collect, cleanse and munge data in a fraction of the time and effort it currently takes.
-
Hadoop-as-a-Service Provider Qubole Now Runs on Google Compute Engine
Qubole, a managed Hadoop-as-a-Service offering is now available on Google Compute Engine (GCE). Qubole was so far only available on Amazon's AWS and this announcement follows only a few days after Google releasing GCE into general availability.
-
Martin Fowler on Data Austerity
Martin Fowler writes about the opposite of Big Data, Datensparsamkeit. This German word roughly translates to “data austerity” or simply “not storing more than you need”.
-
A Survey and Interview on How Hadoop Is Used Today
This post presents the results of a Hortonworks survey of over 500 Hadoop Summit 2013 attendees on how they use Hadoop, and an interview with David McJannet on Hadoop trends today.
-
Big Data at Netflix Drives Business Decisions
Jeff Magnusson from Netflix team gave a presentation at QCon SF 2013 Conference about their Data Platform as a Service. Following up to this presentation, we will look at the technology stack and how it helps Netflix to tackle important business decisions.
-
Open Source SQL-in-Hadoop Solutions: Where Are We?
With Facebook recently releasing Presto as open source, the already crowded SQL-in-Hadoop market just became a tad more intricate. A number of open source tools are competing for the attention of developers: Hortonworks Stinger initiative around Hive, Apache Drill, Apache Tajo, Cloudera’s Impala, Salesforce’s Phoenix (for HBase) and now Facebook’s Presto.
-
Amazon re:invent roundup
Amazon announced a number of new services at the recent re:invent conference in Las Vegas: Amazon WorkSpaces - Desktop Computing in the Cloud, Identity and Access Management using SAML, Amazon AppStream - Delivering Streaming Applications from the Cloud, Amazon Kinesis - Streaming Big Data, CloudTrail - Capturing AWS API Activity, Postgres support in RDS and new EC2 instance types
-
Increasing Pace of Change Drives Agile In Enterprise Applications
The pace of organizational change and technology adoption is increasing which means that enterprise software development needs to find ways to keep pace with these changes. The rise of big data is also driving the need to undertake many experiment and adapt rapidly. Blogger Matt Asay recently wrote about this in a post titled "Hey, Enterprise Developers! Get Agile Or Get Steamrollered"
-
Streaming Big Data With Amazon Kinesis
Amazon recently announced Kinesis, a service that allows developers to stream large amounts of data from different sources and process it. The service is currently in limited preview.
-
Cascading 2.5 Supports Hadoop 2
New version of Cascading released this week incorporates Hadoop 2 support and includes Cascading Lingual - an open source project that provides a comprehensive ANSI SQL interface for accessing Hadoop-based data
-
Presto: Facebook’s Distributed SQL Query Engine
Facebook has open-sourced Presto, their distributed SQL query engine. Presto uses a pipelined architecture rather than the Map/Reduce design found elsewhere. In production since early this year, Facebook has since “deployed in multiple geographical regions and [they] have successfully scaled a single cluster to 1,000 nodes”.