Netflix's goal is to predict what you want to watch before you watch it. They do this by running a number of machine learning (ML) workflows every day. Meson is a workflow orchestration and scheduling framework that manages the lifecycle of all these machine learning pipelines that build, train and validate personalization algorithms to help with the video recommendations.
A full snapshot of more than 2.8 million open source project hosted on GitHub is now available in Google’s BigQuery, Google and GitHub announced. This will make it possible to query almost 2 billion source files hosted on GitHub using SQL.
Shark is a new open-source ORM framework for iOS that aims to be an easy-to-use replacement for Core Data by providing high-performance and thread-safety. InfoQ has spoken with Adrian Herridge, creator of Shark.
TinkerPop, a graph compute framework for OLTP and OLAP graph database and analytics processing graduated to top-level project with the Apache Software Foundation.
In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.
LinkedIn’s Joel Koshy details their Kafka usage, debugging and monitoring two production incidents in using the core Kafka infrastructure concepts, semantics and behavioral patterns to plan for and detect similar problems in the future.
Jamie Grier recently spoke at OSCON 2016 Conference about data streaming architecture using Apache Flink. He talked about the building blocks of data streaming applications and stateful stream processing with code examples of Flink applications and monitoring.
LinkedIn recently detailed open-sourced Kafka Monitor service that they're using to monitor production Kafka clusters as well as extensive testing automation, leading them to identify bugs in the main Kafka trunk and contribute solutions to the open-source community.
Confluent Platform 3.0 messaging system from Confluent, the company behind Apache Kafka messaging framework, supports Kafka Streams for real-time data processing. The company announced last week the general availability of the latest version of the open source Confluent platform.
While the last two versions of SQL Server focused on improving performance by offering new features, SQL Server 2016 looks inwards towards improving existing functionality.
Cloudera announced their partnership with MIT & Harvard's Broad Institute and detailed some of their experience with the Genome Analytics Toolkit pipeline.
In conjunction with the release of SQL Server 2016, Microsoft has announced that the Developer Edition of SQL Server will be free.
Two years after the first release of Apache Spark, Databricks announced the technical preview of Apache Spark 2.0 , based on upstream branch 2.0.0-preview. The preview is not ready for production, neither in terms of stability nor API, but is a release intended to gather feedback from the community ahead of the general availability of the release.
Realm, the open-source, object-oriented database has launched version 1.0 for iOS and Android. Realm's technical team told InfoQ that among the noted changes in the mobile database's latest release are an improved query language with support for partial string matches, relationship traversal, multi-field sorting, and distinct matches.
The NOLOCK directive was broken in Cumulative Update #6 for SQL Server 2014 SP1. As a result, databases that relied on that directive may experience unexpected blocking and/or deadlocks.