Apache Parquet, the open-source columnar storage format for Hadoop, recently graduated from the Apache Software Foundation Incubator and became a top-level project. Initially created by Cloudera and Twitter in 2012 to speed up analytical processing, Parquet is now openly available for Apache Spark, Apache Hive, Apache Pig, Impala, native MapReduce, and other key components of the Hadoop ecosystem.
Capgemini are currently working on Apollo, an open source application platform built on top of the Apache Mesos cluster manager and Docker, which is designed to power next generation web services, microservices and big data platforms running at scale.
Latest version of MemSQL, in-memory database with support for transactions and analytics, includes a new Community Edition for free use by organizations. MemSQL 4, released last week, also supports integration with Apache Spark, Hadoop Distributed File System (HDFS), and Amazon S3.
Flipboard recently reported on an in-house application of deep learning to scale up low-resolution images that illustrates the power and flexibility of this class of learning algorithms.
NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.
Google is making available to customers Cloud Bigtable, their own database used for more than a decade for services such as Search, GMail, Maps or YouTube. While they are not open sourcing Bigtable as they did with other products, the new cloud service is accessible through an open source interface, the Apache HBase 1.0.1 API.
Adatao recently announced the general availability of its Data Intelligence platform. Its platform aims to make data analysis and predictive analytics available to everyone in large organizations. Adatao had secured an investment of $13 million last year from a group of investors including Bloomberg Beta, Lightspeed Venture Partners and Andreessen Horowitz.
Apache Flink is a distributed data flow processing system for performing analytics on large data sets. It can be used for real time data streams as well as batch data processing. It supports APIs in Java and Scala programming languages. Fabian Hueske, PMC member of Apache Flink, spoke about the data processing framework at the recent ApacheCon Conference.
Big data vendors Hortonworks, IBM, and Pivotal recently announced that their Hadoop based platform products will use the common Open Data Platform (ODP). They made the announcement at the recent HadoopSummit Europe Conference of the open platform which includes Apache Hadoop 2.6 (HDFS, YARN, and MapReduce) and Apache Ambari software.
Google announced the general availability of Cloud DNS, expanded locations for load balancing, additional carrier providers for peering, beta availability of Cloud Dataflow and VPN services
Amazon Web Services have recently launched their Amazon Machine Learning service that allows users to learn predictive models in the cloud. After Google with Prediction API, and Microsoft with Azure Machine Learning, Amazon is the latest major cloud service provider to launch a similar service.
Twitter recently announced that it has cut-off their firehose data distributor DataSift. This move echoes Twitter's controversial 2012 API changes which restricted the Twitter client ecosystem. There is much speculation as to whether this latest announcement is an attempt to control the tweet analytics space and whether or not this is behaviour fitting of a platform provider.
After three developer previews, six release candidates and over 1500 closed tickets the Apache foundation has announced version 1.0 of Apache HBase, a NoSQL database in the Hadoop ecosystem. After more than 7 years of active development, the team behind HBase felt that the project had matured and stabilized enough to warrant a 1.0 version.
Pinterest, the company behind the visual bookmarking tool that helps you discover and save creative ideas, is using real-time data analytics for data-driven decision making purposes. It’s experimenting with MemSQL and Spark technologies for real-time user engagement across the globe.
Microsoft increased its foothold in the data science community last winter by acquiring Revolution Analytics, a major provider of software and services based on the open-source R project for computational statistics. The deal is expected to bring R capabilities to the Microsoft suite of products and facilitate the adoption of R-based solutions in the enterprise environment.