Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.
Pivotal has decided to open source core components of their Big Data Suite and has announced the Open Data Platform, an initiative promoting open source and standardization for Big Data.
Project Pachyderm Aims to Build "Modern" Hadoop using Docker and CoreOS.
An article by Jin Scott - A tale of two clusters: Mesos and YARN – describes hardware silos created by using different resource managers on different hardware clusters, most popular being Mesos and Yarn and introduces Myriad – a solution allowing to run a YARN cluster on Mesos.
Apache Hive has released version 1.0 of their project on February 6th, 2015. Originally planned as version 0.14.1, the community voted to change the version numbering to 1.0.0 to reflect the amount of maturity the project has reached.
CoreOS announced the availability of etcd 2.0, the first stable version of the open source distributed key-value store.
When adopting a microservices architecture, using an external architect to create the design of a service instead of helping a team make their own decisions about design and implementation is one of several traps or bad practices that Vladimir Khorikov has experienced in his work.
Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.
Reasons for building microservices are often about using isolation as a means to handle change. Sharing code between services couples your services to each other reducing the effectiveness of the isolation and the ability to handle change, David Dawson writes in a series of blog posts questioning the Don’t Repeat Yourself (DRY) principle in connection with microservices.
Five years ago many NoSQL databases were pre version 1.0 and when, it came to the CAP tradeoff, choosing availability over consistency was in vogue. Fast forward to today and distributed, fault tolerant transactions are moving into the fore as a new round of NoSQL databases seek to redefine our NoSQL expectations.
Splice Machine version 1.0 supports analytic window functions and integration with Hadoop ecosystem. Splice Machine team recently released their Hadoop based RDBMS data management solution that can be used for transactional workloads on Hadoop.
There is a strong trend for microservice based architectures and frequent discussions comparing them to monoliths, Robert Annett explains and defines a monolith as an architectural style or a pattern using three basic viewtypes for characterization.
LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.
At the 2014 QCon San Francisco conference, LinkedIn's Lin Qiao gave a talk on their Gobblin project (also summarized in a blog post) that is a unified data ingestion system for their internal and external data sources.
Stripe, the internet payments infrastructure company recently announced open sourcing a set of internally developed tools based on Apache Hadoop.Timberlake, Brushfire, Sequins and Herringbone all contribute to enriching the available tools for building an Apache Hadoop stack.