BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage AI, ML & Data Engineering Content on InfoQ

  • Microsoft Acquires Revolution Analytics

    Microsoft increased its foothold in the data science community last winter by acquiring Revolution Analytics, a major provider of software and services based on the open-source R project for computational statistics. The deal is expected to bring R capabilities to the Microsoft suite of products and facilitate the adoption of R-based solutions in the enterprise environment.

  • Apache Spark 1.3 Released, Data Frames, Spark SQL, and MLlib Improvements

    Apache Spark has released version 1.3 of their project. The main improvements are the addition of the DataFrames API, better maturity of the Spark SQL, as well as a number of new methods added to the machine learning library MLlib, and better integration of Spark Streaming with Apache Kafka.

  • MongoDB 3.0 - WiredTiger Storage Engine and Updated MMS

    Some time ago, when MongoDB 2.6 was released Kelly Stirman, Director of Products at MongoDB answered our questions regarding the latest release. Now with MongoDB 3.0 announced for March and MongoDB 3.0 RC-8 already available, it’s time to see in more detail what WiredTiger storage engine, new and improved MMS and storage compression can bring to NoSQL users.

  • Advancing The Realtime Web With RethinkDB

    RethinkDB is an open-source distributed database built to store JSON and scale with very little effort. Self compared with MongoDB, RethinkDB is aiming to be developer friendly all the while maintaining an operations oriented approach of being highly available and high scale..a way to subscribe to change notifications from the database. A client can subscribe to changes in a table and get notified

  • Pivotal Open Sources Their Big Data Suite

    Pivotal has decided to open source core components of their Big Data Suite and has announced the Open Data Platform, an initiative promoting open source and standardization for Big Data.

  • Project Pachyderm Aims to Build a "Modern" Hadoop on Docker

    Project Pachyderm Aims to Build "Modern" Hadoop using Docker and CoreOS.

  • Apache Hive 1.0 Released, HiveServer2 Becomes Main Engine, Stable API Defined

    Apache Hive has released version 1.0 of their project on February 6th, 2015. Originally planned as version 0.14.1, the community voted to change the version numbering to 1.0.0 to reflect the amount of maturity the project has reached.

  • Azure DocumentDB is Available in More Regions with Increased Account Limits

    Azure DocumentDB, Microsoft’s NoSQL cloud database service is available in newer regions within Asia and US. The account limits are enhanced to support increased capacity units and document size.

  • EMRFS Brings Consistency to Amazon S3

    Amazon recently announced EMRFS, an implementation of HDFS that allows EMR clusters to use S3 with a stronger consistency model. When enabled, this new feature keeps track of operations performed on S3 and provides list consistency, delete consistency and read-after-write-consistency, for any cluster created with Amazon Machine Image (AMI) version 3.2.1 or greater.

  • Apache Flink 0.8.0 Released, Roadmap for 2015 Published

    Apache Flink has released the version 0.8.0 of their project. Besides the usual performance, compatibility, and stability improvements, it has also added a streaming Scala API, where streaming capabilities had so far been missing. Apache Flink has also been promoted to the top-level of the Apache projects recently after joining the incubator roughly nine months ago.

  • Facebook Open Sources Modules for Faster Deep Learning on Torch

    Facebook has open sourced a number of modules for faster training of neural networks on Torch.

  • Google on the Technical Debt of Machine Learning

    A number of Google researchers and engineers presented their view on the technical debt of using machine learning at a NIPS workshop. They identified different aspects of technical debt and came to the conclusion that without proper care, using machine learning or complex data analysis in your company can induce new kinds of technical debt different from classical software engineering.

  • Distributed, Fault Tolerant Transactions in NoSQL

    Five years ago many NoSQL databases were pre version 1.0 and when, it came to the CAP tradeoff, choosing availability over consistency was in vogue. Fast forward to today and distributed, fault tolerant transactions are moving into the fore as a new round of NoSQL databases seek to redefine our NoSQL expectations.

  • Apache Spark 1.2.0 Supports Netty-based Implementation, High Availability and Machine Learning APIs

    Apache Spark 1.2.0 was released with Netty-based implementation, High Availability and Machine Learning APIs. It represents the work of 172 contributors from over 60 institutions and comprises more than 1000 patches. InfoQ talks with Patrick Wendell, a Spark committer and PMC member.

  • Alex Bordei on Scaling NoSQL Databases

    Network performance, virtualization and testing are some of the considerations to address performance and scalability issues with NoSQL databases. Alex Bordei wrote about scaling NoSQL databases and tips for increasing performance when using these data stores.

BT