BT
AI, ML & Data Engineering Follow 1002 Followers

Q&A with Saumitra Buragohain on Hortonworks Data Platform 3.0

by Rags Srinivas Follow 11 Followers on  Jul 19, 2018

InfoQ caught up with Saumitra Buragohain, senior director of Product Management at Hortonworks, regarding Hadoop in general and HDP 3.0 in particular.

AI, ML & Data Engineering Follow 1002 Followers

Dataiku's Latest Release Integrates Deep-Learning for Computer Vision

by Alexis Perrier Follow 1 Followers on  Apr 11, 2018

Collaborative data science platform Dataiku's latest release of its Data Science Studio includes pre-trained deep learning models for image processing. The DSS platform implements each step of a data-science project from data-sourcing and visualization to production deployment. Its machine-learning module supports standard libraries and it integrates with Hadoop and multiple Spark engines.

DevOps Follow 972 Followers

DevOps Workbench Launched by ZeroStack

by Helen Beal Follow 4 Followers on  Jan 12, 2018

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

AI, ML & Data Engineering Follow 1002 Followers

Apache HBase 1.3 Ships with Multiple Performance Improvements

by Alexandre Rodrigues Follow 1 Followers on  Jan 30, 2017

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total.

AI, ML & Data Engineering Follow 1002 Followers

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

by Alexandre Rodrigues Follow 1 Followers on  Dec 08, 2016 1

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

AI, ML & Data Engineering Follow 1002 Followers

Combine SQL Server with Hadoop Using PolyBase

by Jonathan Allen Follow 612 Followers on  Jun 02, 2016 2

With the recently released SQL Server 2016, you can now use SQL queries against Hadoop and Azure blob storage. Not only do you no longer need to write map/reduce operations, you can also join relational and non-relational data with a single query.

AI, ML & Data Engineering Follow 1002 Followers

Elephant in the Cloud - Hadoop as a Service

by Srini Penchikala Follow 38 Followers on  May 02, 2016 2

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.

AI, ML & Data Engineering Follow 1002 Followers

Google Cloud Machine Learning and Tensor Flow Alpha Release

by Dylan Raithel Follow 8 Followers on  Apr 18, 2016

Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.

AI, ML & Data Engineering Follow 1002 Followers

Apache Flink 1.0.0 is Released

by Rags Srinivas Follow 11 Followers on  Mar 24, 2016

InfoQ's Rags Srinivas caught up with Stephan Ewen, a project committer for Apache Flink about the 1.0.0 Release and the roadmap

Big Data Follow 149 Followers

Hunk/Hadoop: Performance Best Practices

by Jonathan Allen Follow 612 Followers on  Sep 23, 2015

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

Big Data Follow 149 Followers

Using Hunk+Hadoop as a Backend for Splunk

by Jonathan Allen Follow 612 Followers on  Sep 22, 2015

Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.

Big Data Follow 149 Followers

Splunk .conf 2015 Keynote

by Jonathan Allen Follow 612 Followers on  Sep 22, 2015

Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.

Followers

Parquet Becomes Top-Level Apache Project

by Jérôme Serrano Follow 1 Followers on  Jun 11, 2015

Apache Parquet, the open-source columnar storage format for Hadoop, recently graduated from the Apache Software Foundation Incubator and became a top-level project. Initially created by Cloudera and Twitter in 2012 to speed up analytical processing, Parquet is now openly available for Apache Spark, Apache Hive, Apache Pig, Impala, native MapReduce, and other key components of the Hadoop ecosystem.

Followers

MemSQL 4 Database Supports Community Edition, Geospatial Intelligence and Spark Integration

by Srini Penchikala Follow 38 Followers on  May 30, 2015

Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.

Followers

Glenn Tamkin on Applying Apache Hadoop to NASA's Big Climate Data

by Srini Penchikala Follow 38 Followers on  May 06, 2015

NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT