InfoQ Homepage Hadoop Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Alexandre Rodrigues
on Dec 08, 2016
AI, ML & Data Engineering

Combine SQL Server with Hadoop Using PolyBase

With the recently released SQL Server 2016, you can now use SQL queries against Hadoop and Azure blob storage. Not only do you no longer need to write map/reduce operations, you can also join relational and non-relational data with a single query.

Jonathan Allen
on Jun 02, 2016
AI, ML & Data Engineering

Elephant in the Cloud - Hadoop as a Service

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.

Srini Penchikala
on May 02, 2016
AI, ML & Data Engineering

Google Cloud Machine Learning and Tensor Flow Alpha Release

Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.

Dylan Raithel
on Apr 18, 2016
AI, ML & Data Engineering

Apache Flink 1.0.0 is Released

InfoQ's Rags Srinivas caught up with Stephan Ewen, a project committer for Apache Flink about the 1.0.0 Release and the roadmap

Rags Srinivas
on Mar 24, 2016
AI, ML & Data Engineering

Hunk/Hadoop: Performance Best Practices

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Using Hunk+Hadoop as a Backend for Splunk

Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Splunk .conf 2015 Keynote

Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.

Jonathan Allen
on Sep 22, 2015
Parquet Becomes Top-Level Apache Project

Apache Parquet, the open-source columnar storage format for Hadoop, recently graduated from the Apache Software Foundation Incubator and became a top-level project. Initially created by Cloudera and Twitter in 2012 to speed up analytical processing, Parquet is now openly available for Apache Spark, Apache Hive, Apache Pig, Impala, native MapReduce, and other key components of the Hadoop ecosystem.

Jérôme Serrano
on Jun 11, 2015
MemSQL 4 Database Supports Community Edition, Geospatial Intelligence and Spark Integration

Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.

Srini Penchikala
on May 30, 2015
Hortonworks, IBM and Pivotal to Support Open Data Platform in Their Big Data Solutions

Big data vendors Hortonworks, IBM, and Pivotal recently announced that their Hadoop based platform products will use the common Open Data Platform (ODP). They made the announcement at the recent HadoopSummit Europe Conference of the open platform which includes Apache Hadoop 2.6 (HDFS, YARN, and MapReduce) and Apache Ambari software.

Srini Penchikala
on Apr 24, 2015
Apache HBase Hits 1.0

After three developer previews, six release candidates and over 1500 closed tickets the Apache foundation has announced version 1.0 of Apache HBase, a NoSQL database in the Hadoop ecosystem. After more than 7 years of active development, the team behind HBase felt that the project had matured and stabilized enough to warrant a 1.0 version.

Benjamin Darfler
on Apr 07, 2015
Spring XD 1.1: Simplifying Big Data like Spring Did for Java EE

Pivotal recently released Spring XD 1.1 GA with new features including stream processing with Reactor, RxJava, Spark Streaming and Python. Additionally support for Kafka, batching and compression with RabbitMQ, and support for container group management when running on YARN are now featured.

Matt Raible
on Mar 05, 2015
Google Open Sources MapReduce Framework for C to Run Native Code in Hadoop

Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.

Srini Penchikala
on Feb 25, 2015
Project Pachyderm Aims to Build a "Modern" Hadoop on Docker

Project Pachyderm Aims to Build "Modern" Hadoop using Docker and CoreOS.

Matt Kapilevich
on Feb 17, 2015

Newer News

Older News

InfoQ Software Architects' Newsletter

News