BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Hadoop Content on InfoQ

  • Hortonworks, IBM and Pivotal to Support Open Data Platform in Their Big Data Solutions

    Big data vendors Hortonworks, IBM, and Pivotal recently announced that their Hadoop based platform products will use the common Open Data Platform (ODP). They made the announcement at the recent HadoopSummit Europe Conference of the open platform which includes Apache Hadoop 2.6 (HDFS, YARN, and MapReduce) and Apache Ambari software.

  • Apache HBase Hits 1.0

    After three developer previews, six release candidates and over 1500 closed tickets the Apache foundation has announced version 1.0 of Apache HBase, a NoSQL database in the Hadoop ecosystem. After more than 7 years of active development, the team behind HBase felt that the project had matured and stabilized enough to warrant a 1.0 version.

  • Spring XD 1.1: Simplifying Big Data like Spring Did for Java EE

    Pivotal recently released Spring XD 1.1 GA with new features including stream processing with Reactor, RxJava, Spark Streaming and Python. Additionally support for Kafka, batching and compression with RabbitMQ, and support for container group management when running on YARN are now featured.

  • Google Open Sources MapReduce Framework for C to Run Native Code in Hadoop

    Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.

  • Pivotal Open Sources Their Big Data Suite

    Pivotal has decided to open source core components of their Big Data Suite and has announced the Open Data Platform, an initiative promoting open source and standardization for Big Data.

  • Project Pachyderm Aims to Build a "Modern" Hadoop on Docker

    Project Pachyderm Aims to Build "Modern" Hadoop using Docker and CoreOS.

  • Project Myriad: Mesos and YARN Working Together

    An article by Jin Scott - A tale of two clusters: Mesos and YARN – describes hardware silos created by using different resource managers on different hardware clusters, most popular being Mesos and Yarn and introduces Myriad – a solution allowing to run a YARN cluster on Mesos.

  • Apache Hive 1.0 Released, HiveServer2 Becomes Main Engine, Stable API Defined

    Apache Hive has released version 1.0 of their project on February 6th, 2015. Originally planned as version 0.14.1, the community voted to change the version numbering to 1.0.0 to reflect the amount of maturity the project has reached.

  • EMRFS Brings Consistency to Amazon S3

    Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.

  • Splice Machine Version 1.0 Supports Integration with Hadoop and Analytic Window Functions

    Splice Machine version 1.0 supports analytic window functions and integration with Hadoop ecosystem. Splice Machine team recently released their Hadoop based RDBMS data management solution that can be used for transactional workloads on Hadoop.

  • LinkedIn Open Sources Cubert With an Eye To Big Data Analytics

    LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.

  • Gobblin, LinkedIn's Unified Data Ingestion Platform

    At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.

  • Stripe Open Sources Tools For Apache Hadoop

    Stripe, the internet payments infrastructure company recently announced open sourcing a set of internally developed tools based on Apache Hadoop.Timberlake, Brushfire, Sequins and Herringbone all contribute to enriching the available tools for building an Apache Hadoop stack.

  • Spark Sets New Record in Sort Performance

    Databricks has recently announced a new record in the Daytona GraySort contest using the Spark processing engine. The Daytona GraySort contest is a 3rd party benchmark measuring how fast a system can sort 100 Terabytes of data. Databricks posted a throughput of 4.27 TB/min over a cluster of 206 machines for their official run.

  • Hortonworks Data Platform Makes an Enterprise Push

    Hortonworks Data Platform (HDP) version 2.2 with features based around Hadoop and YARN has better support for enterprise features such as security, compliance and so on as well.

BT