BT

Pivotal Open Sources Their Big Data Suite

by Abel Avram on  Feb 19, 2015 8

Pivotal has decided to open source core components of their Big Data Suite and has announced the Open Data Platform, an initiative promoting open source and standardization for Big Data.

Project Myriad: Mesos and YARN Working Together

by Boris Lublinsky on  Feb 14, 2015 1

An article by Jin Scott - A tale of two clusters: Mesos and YARN – describes hardware silos created by using different resource managers on different hardware clusters, most popular being Mesos and Yarn and introduces Myriad – a solution allowing to run a YARN cluster on Mesos.

EMRFS Brings Consistency to Amazon S3

by Jérôme Serrano on  Jan 27, 2015

Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.

Apache Spark 1.2.0 Supports Netty-based Implementation, High Availability and Machine Learning APIs

by Rags Srinivas on  Jan 07, 2015

Apache Spark 1.2.0 was released with Netty-based implementation, High Availability and Machine Learning APIs. It represents the work of 172 contributors from over 60 institutions and comprises more than 1000 patches. InfoQ talks with Patrick Wendell, a Spark committer and PMC member.

LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

by Alex Giamas on  Dec 17, 2014

LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

Gobblin, LinkedIn's Unified Data Ingestion Platform

by Mikio Braun on  Dec 15, 2014

At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.

Stripe Open Sources Tools For Apache Hadoop

by Alex Giamas on  Dec 09, 2014

Stripe, the internet payments infrastructure company recently announced open sourcing a set of internally developed tools based on Apache Hadoop.Timberlake, Brushfire, Sequins and Herringbone all contribute to enriching the available tools for building an Apache Hadoop stack.

Microsoft Expands Azure Machine Learning and Real Time Analytics Offering

by Alex Giamas on  Oct 31, 2014

Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout

Hortonworks Announces Stinger.next Roadmap to Deliver Hadoop Scale SQL with Apache Hive

by Adam Berry on  Sep 25, 2014

Following on from the Stinger initiative delivered in Apache Hive 0.13, Hortonworks has laid out the Stinger.next roadmap to provide fully ACID transactions, a sub-second query engine, and more complete SQL 2011 analytics support, all driving towards the goal of “enhancing the speed, scale and breadth of SQL support” in Hive.

Hadoop Summit 2014 Day One - On the Path to Enterprise Grade Hadoop

by Jeevak Kasarkod on  Jun 04, 2014

Hadoop Summit Day One report covers the important trends and changes from last year's summit. It also covers the important announcements of the day in relation to this year's trending topics. This report focuses on the platform specific innovations and announcements and not the broader partner ecosystem, which will be covered in the next few days.

DataTorrent 1.0 Handles >1B Real-time Events/sec

by Abel Avram on  Jun 03, 2014 7

DataTorrent is a real-time streaming and analyzing platform that can process over 1B real-time events/sec.

Community the Focus at ApacheCON NA 2014

by Carlos Sanchez on  May 15, 2014

This year's ApacheCON North America conference saw key speakers focus on open source and its community. With more than 400 attendees, over 70 projects represented and 180 conference sessions it covered as many diverse topics as diverse the Apache Software Foundation projects are.

Introducing Microsoft Avro

by Jonathan Allen on  May 08, 2014 3

Microsoft has announced their implementation of the Apache Avro wire protocol. Avro is described a “compact binary data serialization format similar to Thrift or Protocol Buffers” with additional features needed for distributed processing environments such as Hadoop.

Coverity Scan Gets Better with Java, Apache Hadoop, HBase and Cassandra Support

by Anand Narayanaswamy on  May 02, 2014

The recently released open source scan report by Coverity mainly detected and fixed Resource Leaks, Null Pointer and Control Flow issues besides several other issues. It also scanned the source code of Linux and fixed several bugs.

Cloudera Partners with MongoDB to Store Hadoop Data on Their NoSQL DB

by Abel Avram on  Apr 29, 2014

Starting from the premise that today “80 percent of enterprise data is unstructured and growing at twice the rate of structured data”, Cloudera and MongoDB have announced a “strategic” partnership meant to provide customers the option to combine Cloudera’s Apache-based Big Data platform with MongoDB’s NoSQL solution.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2015 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT