InfoQ Homepage Hadoop Content on InfoQ

News

RSS Feed

Newer Older

Hortonworks Announces Hive 0.13 with Vectorized Query Execution and Hive on Tez

Hortonworks announced the release of Hive 0.13 which marks the completion of the Stinger initiative. The new release also includes performance improvements as well as some new SQL features. Hive is an open source SQL Engine written on top of Hadoop that lets users query big data warehouses by writing SQL queries instead of MapReduce jobs.

Matt Kapilevich
on May 13, 2014
Introducing Microsoft Avro

Microsoft has announced their implementation of the Apache Avro wire protocol. Avro is described a “compact binary data serialization format similar to Thrift or Protocol Buffers” with additional features needed for distributed processing environments such as Hadoop.

Jonathan Allen
on May 08, 2014
Cloudera Partners with MongoDB to Store Hadoop Data on Their NoSQL DB

Starting from the premise that today “80 percent of enterprise data is unstructured and growing at twice the rate of structured data”, Cloudera and MongoDB have announced a “strategic” partnership meant to provide customers the option to combine Cloudera’s Apache-based Big Data platform with MongoDB’s NoSQL solution.

Abel Avram
on Apr 29, 2014
A Roundup of Cloudera Distribution Containing Apache Hadoop 5

Cloudera recently released the latest version of its software distribution, CDH5. Almost 20 months after the last major version, CDH4 seems like ages in the Big Data world. We take a look at new features this release brings and the future direction of Cloudera after the latest round of investment from Intel and Google Ventures.

Alex Giamas
on Apr 18, 2014
Hadoop Gets Better Security, Several Operational Improvements

Hadoop 2.4.0 was recently released with several enhancements to both HDFS and YARN. This includes support for Access Control Lists, Native support for Rolling upgrades, Full HTTPS support for HDFS, Automatic failover of YARN and other operational improvements

Roopesh Shenoy
on Apr 18, 2014
Rebecca Parsons on the ThoughtWorks Technology Radar

In January ThoughtWorks released the latest version of their Technology Radar in which they track what's interesting in the software development ecosystem. The big themes this year are (1) early warning systems and recovery in production, (2) the tension between privacy and big data, (3) the javascript ecosystem and (4) blurring of the line between the physical and virtual worlds.

Shane Hastie
on Mar 28, 2014
Big Data Hadoop Solutions, State of Affairs in Q1/2014

According to a new Forrest report, Hadoop’s momentum is unstoppable. Its usage in the enterprise is continuously growing due to its ability to offer companies new ways to store, process, analyze, and share big data. The report takes a look at Hadoop vendors and ranks them.

Boris Lublinsky
on Mar 04, 2014
Spark Officially Graduates From Apache Incubator

Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.

Alex Giamas
on Feb 28, 2014
Elasticsearch 1.0.0 released

Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. The new version comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use.

Ralph Winzinger
on Feb 14, 2014
Google Improves Hadoop Performance with New Cloud Storage Connector

With a new connector, it is now possible for Hadoop to run directly against Google Cloud Storage instead of using the default, distributed file system. This results in lower storage costs, fewer data replication activities, and a simpler overall process.

Richard Seroter
on Jan 20, 2014
New Education Opportunities for Data Scientists

2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.

Charles Menguy
on Jan 14, 2014
Hadoop-as-a-Service Provider Qubole Now Runs on Google Compute Engine

Qubole, a managed Hadoop-as-a-Service offering is now available on Google Compute Engine (GCE). Qubole was so far only available on Amazon's AWS and this announcement follows only a few days after Google releasing GCE into general availability.

Michael Hausenblas
on Dec 28, 2013
Hadoop Jobs on GPU with ParallelX

The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.

Charles Menguy
on Dec 26, 2013
A Survey and Interview on How Hadoop Is Used Today

This post presents the results of a Hortonworks survey of over 500 Hadoop Summit 2013 attendees on how they use Hadoop, and an interview with David McJannet on Hadoop trends today.

Boris Lublinsky
on Dec 12, 2013
Open Source SQL-in-Hadoop Solutions: Where Are We?

With Facebook recently releasing Presto as open source, the already crowded SQL-in-Hadoop market just became a tad more intricate. A number of open source tools are competing for the attention of developers: Hortonworks Stinger initiative around Hive, Apache Drill, Apache Tajo, Cloudera’s Impala, Salesforce’s Phoenix (for HBase) and now Facebook’s Presto.

Michael Hausenblas
on Dec 10, 2013

Newer News

Older News

InfoQ Software Architects' Newsletter

News