BT

A Roundup of Cloudera Distribution Containing Apache Hadoop 5

by Alex Giamas on  Apr 18, 2014

Cloudera recently released the latest version of its software distribution, CDH5. Almost 20 months after the last major version, CDH4 seems like ages in the Big Data world. We take a look at new features this release brings and the future direction of Cloudera after the latest round of investment from Intel and Google Ventures.

Hadoop Gets Better Security, Several Operational Improvements

by Roopesh Shenoy on  Apr 18, 2014

Hadoop 2.4.0 was recently released with several enhancements to both HDFS and YARN. This includes support for Access Control Lists, Native support for Rolling upgrades, Full HTTPS support for HDFS, Automatic failover of YARN and other operational improvements

Hydra Takes On Hadoop

by Rags Srinivas on  Apr 11, 2014

The social-networking company AddThis open-sourced Hydra under the Apache version 2.0 License in a recent announcement. Hydra grew from an in-house platform created to process semi-structured social data as live streams and do efficient query processing on those data sets.

Rebecca Parsons on the ThoughtWorks Technology Radar

by Shane Hastie on  Mar 28, 2014 3

In January ThoughtWorks released the latest version of their Technology Radar in which they track what's interesting in the software development ecosystem. The big themes this year are (1) early warning systems and recovery in production, (2) the tension between privacy and big data, (3) the javascript ecosystem and (4) blurring of the line between the physical and virtual worlds.

Big Data Hadoop Solutions, State of Affairs in Q1/2014

by Boris Lublinsky on  Mar 04, 2014 1

According to a new Forrest report, Hadoop’s momentum is unstoppable. Its usage in the enterprise is continuously growing due to its ability to offer companies new ways to store, process, analyze, and share big data. The report takes a look at Hadoop vendors and ranks them.

Spark Officially Graduates From Apache Incubator

by Alex Giamas on  Feb 28, 2014

Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.

Elasticsearch 1.0.0 released

by Ralph Winzinger on  Feb 14, 2014

Elasticsearch released version 1.0.0 of its self-titled, open-source analytics tool. Elasticsearch is a distributed search engine which allows for real-time data analysis in big-data environments. The new version comes with various functional enhancements and changes to the API to make Elasticsearch more intuitive and powerful to use.

Interactive SQL in Apache Hadoop with Impala and Hive

by Alex Giamas on  Feb 07, 2014

In the race for interactive SQL in Big Data environments, there are two open source based front-runners, Impala and Hive with the Stinger project. Cloudera recently announced that Impala is up to 69 times faster than Hive 0.12 and can outperform DBMS. Other than raw speed, we take a look at other considerations in choosing a SQL engine for Hadoop and also Tez, an application framework for YARN.

Google Improves Hadoop Performance with New Cloud Storage Connector

by Richard Seroter on  Jan 20, 2014

With a new connector, it is now possible for Hadoop to run directly against Google Cloud Storage instead of using the default, distributed file system. This results in lower storage costs, fewer data replication activities, and a simpler overall process.

New Education Opportunities for Data Scientists

by Charles Menguy on  Jan 14, 2014

2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.

Hadoop-as-a-Service Provider Qubole Now Runs on Google Compute Engine

by Michael Hausenblas on  Dec 28, 2013

Qubole, a managed Hadoop-as-a-Service offering is now available on Google Compute Engine (GCE). Qubole was so far only available on Amazon's AWS and this announcement follows only a few days after Google releasing GCE into general availability.

Hadoop Jobs on GPU with ParallelX

by Charles Menguy on  Dec 26, 2013 1

The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.

A Survey and Interview on How Hadoop Is Used Today

by Boris Lublinsky on  Dec 12, 2013

This post presents the results of a Hortonworks survey of over 500 Hadoop Summit 2013 attendees on how they use Hadoop, and an interview with David McJannet on Hadoop trends today.

Open Source SQL-in-Hadoop Solutions: Where Are We?

by Michael Hausenblas on  Dec 10, 2013

With Facebook recently releasing Presto as open source, the already crowded SQL-in-Hadoop market just became a tad more intricate. A number of open source tools are competing for the attention of developers: Hortonworks Stinger initiative around Hive, Apache Drill, Apache Tajo, Cloudera’s Impala, Salesforce’s Phoenix (for HBase) and now Facebook’s Presto.

A Few Highlights from QConSF2013- Part 1 of 2

by Martin Monroe on  Nov 30, 2013

On each day of the 3-day conference at the inviting environs offered at the Hyatt there was a jam-packed schedule of speakers, exhibits and activities that made for some difficult decisions as to which tracks and what happening to attend.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2013 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT