BT
AI, ML & Data Engineering Follow 1068 Followers

The Evolution of Uber’s 100+ Petabyte Big Data Platform

by Hrishikesh Barua Follow 16 Followers on  Nov 10, 2018

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Cloud Follow 359 Followers

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

by Alex Giamas Follow 10 Followers on  Oct 31, 2018

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.

AI, ML & Data Engineering Follow 1068 Followers

Q&A with Microsoft's Arindam Chatterjee Discussing Azure HDInsight 4.0

by Rags Srinivas Follow 11 Followers on  Oct 23, 2018

InfoQ caught up with Arindam Chatterjee, principal group manager at Microsoft, regarding the announcements about HDInsight at Microsoft Ignite.

AI, ML & Data Engineering Follow 1068 Followers

Q&A with Saumitra Buragohain on Hortonworks Data Platform 3.0

by Rags Srinivas Follow 11 Followers on  Jul 19, 2018

InfoQ caught up with Saumitra Buragohain, senior director of Product Management at Hortonworks, regarding Hadoop in general and HDP 3.0 in particular.

AI, ML & Data Engineering Follow 1068 Followers

Apache HBase 1.3 Ships with Multiple Performance Improvements

by Alexandre Rodrigues Follow 1 Followers on  Jan 30, 2017

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total.

Followers

Glenn Tamkin on Applying Apache Hadoop to NASA's Big Climate Data

by Srini Penchikala Follow 40 Followers on  May 06, 2015

NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.

Followers

Pivotal Open Sources Their Big Data Suite

by Abel Avram Follow 12 Followers on  Feb 19, 2015 8

Pivotal has decided to open source core components of their Big Data Suite and has announced the Open Data Platform, an initiative promoting open source and standardization for Big Data.

Followers

Project Myriad: Mesos and YARN Working Together

by Boris Lublinsky Follow 1 Followers on  Feb 14, 2015 1

An article by Jin Scott - A tale of two clusters: Mesos and YARN – describes hardware silos created by using different resource managers on different hardware clusters, most popular being Mesos and Yarn and introduces Myriad – a solution allowing to run a YARN cluster on Mesos.

Followers

EMRFS Brings Consistency to Amazon S3

by Jérôme Serrano Follow 1 Followers on  Jan 27, 2015

Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.

Followers

LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

by Alex Giamas Follow 10 Followers on  Dec 17, 2014

LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

Followers

Stripe Open Sources Tools For Apache Hadoop

by Alex Giamas Follow 10 Followers on  Dec 09, 2014

Stripe, the internet payments infrastructure company recently announced open sourcing a set of internally developed tools based on Apache Hadoop.Timberlake, Brushfire, Sequins and Herringbone all contribute to enriching the available tools for building an Apache Hadoop stack.

Followers

Microsoft Expands Azure Machine Learning and Real Time Analytics Offering

by Alex Giamas Follow 10 Followers on  Oct 31, 2014

Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout

Followers

Hortonworks Announces Stinger.next Roadmap to Deliver Hadoop Scale SQL with Apache Hive

by Adam Berry Follow 0 Followers on  Sep 25, 2014

Following on from the Stinger initiative delivered in Apache Hive 0.13, Hortonworks has laid out the Stinger.next roadmap to provide fully ACID transactions, a sub-second query engine, and more complete SQL 2011 analytics support, all driving towards the goal of “enhancing the speed, scale and breadth of SQL support” in Hive.

Followers

Hadoop Summit 2014 Day One - On the Path to Enterprise Grade Hadoop

by Jeevak Kasarkod Follow 4 Followers on  Jun 04, 2014

Hadoop Summit Day One report covers the important trends and changes from last year's summit. It also covers the important announcements of the day in relation to this year's trending topics. This report focuses on the platform specific innovations and announcements and not the broader partner ecosystem, which will be covered in the next few days.

Followers

Community the Focus at ApacheCON NA 2014

by Carlos Sanchez Follow 0 Followers on  May 15, 2014

This year's ApacheCON North America conference saw key speakers focus on open source and its community. With more than 400 attendees, over 70 projects represented and 180 conference sessions it covered as many diverse topics as diverse the Apache Software Foundation projects are.

BT