InfoQ Homepage Hadoop Content on InfoQ

News

RSS Feed

Newer Older

Cloud

Microsoft Announces New Azure Analytics Services ADLS, ADX and More

Microsoft has announced the general availability of two new Azure analytics services - Azure Data Lake Storage Gen2 (ADLS) and Azure Data Explorer (ADX). Furthermore, Microsoft also announced the preview of Azure Data Factory Mapping Data Flow.

Steef-Jan Wiggers
on Feb 17, 2019
Cloud

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.

Alex Giamas
on Oct 31, 2018
AI, ML & Data Engineering

Dataiku's Latest Release Integrates Deep-Learning for Computer Vision

Collaborative data science platform Dataiku's latest release of its Data Science Studio includes pre-trained deep learning models for image processing. The DSS platform implements each step of a data-science project from data-sourcing and visualization to production deployment. Its machine-learning module supports standard libraries and it integrates with Hadoop and multiple Spark engines.

Alexis Perrier
on Apr 11, 2018
DevOps

DevOps Workbench Launched by ZeroStack

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

Helen Beal
on Jan 12, 2018
AI, ML & Data Engineering

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Alexandre Rodrigues
on Dec 08, 2016
AI, ML & Data Engineering

Combine SQL Server with Hadoop Using PolyBase

With the recently released SQL Server 2016, you can now use SQL queries against Hadoop and Azure blob storage. Not only do you no longer need to write map/reduce operations, you can also join relational and non-relational data with a single query.

Jonathan Allen
on Jun 02, 2016
AI, ML & Data Engineering

Elephant in the Cloud - Hadoop as a Service

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.

Srini Penchikala
on May 02, 2016
AI, ML & Data Engineering

Google Cloud Machine Learning and Tensor Flow Alpha Release

Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.

Dylan Raithel
on Apr 18, 2016
AI, ML & Data Engineering

Apache Flink 1.0.0 is Released

InfoQ's Rags Srinivas caught up with Stephan Ewen, a project committer for Apache Flink about the 1.0.0 Release and the roadmap

Rags Srinivas
on Mar 24, 2016
AI, ML & Data Engineering

Hunk/Hadoop: Performance Best Practices

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Using Hunk+Hadoop as a Backend for Splunk

Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Splunk .conf 2015 Keynote

Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.

Jonathan Allen
on Sep 22, 2015
Parquet Becomes Top-Level Apache Project

Apache Parquet, the open-source columnar storage format for Hadoop, recently graduated from the Apache Software Foundation Incubator and became a top-level project. Initially created by Cloudera and Twitter in 2012 to speed up analytical processing, Parquet is now openly available for Apache Spark, Apache Hive, Apache Pig, Impala, native MapReduce, and other key components of the Hadoop ecosystem.

Jérôme Serrano
on Jun 11, 2015
MemSQL 4 Database Supports Community Edition, Geospatial Intelligence and Spark Integration

Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.

Srini Penchikala
on May 30, 2015
Hortonworks, IBM and Pivotal to Support Open Data Platform in Their Big Data Solutions

Big data vendors Hortonworks, IBM, and Pivotal recently announced that their Hadoop based platform products will use the common Open Data Platform (ODP). They made the announcement at the recent HadoopSummit Europe Conference of the open platform which includes Apache Hadoop 2.6 (HDFS, YARN, and MapReduce) and Apache Ambari software.

Srini Penchikala
on Apr 24, 2015

Newer News

Older News

InfoQ Software Architects' Newsletter

News