BT
x Share your thoughts on trends and content!

Basho Open Sources Time Series Database Riak TS 1.3

by on  Jul 15, 2016

InfoQ's Rags Srinivas talks to Basho's CTO Dave McCrory about the open sourcing of Riak TS 1.3 which is geared to handle time series data.

Meson Workflow Orchestration and Scheduling Framework for Netflix Recommendations

by on  Jul 10, 2016

Netflix's goal is to predict what you want to watch before you watch it. They do this by running a number of machine learning (ML) workflows every day. Meson is a workflow orchestration and scheduling framework that manages the lifecycle of all these machine learning pipelines that build, train and validate personalization algorithms to help with the video recommendations.

Google BigQuery Now Allows to Query All Open-Source Projects on GitHub

by on  Jul 08, 2016 2

A full snapshot of more than 2.8 million open source project hosted on GitHub is now available in Google’s BigQuery, Google and GitHub announced. This will make it possible to query almost 2 billion source files hosted on GitHub using SQL.

Neha Narkhede: Large-Scale Stream Processing with Apache Kafka

by on  Jun 19, 2016

In her presentation "Large-Scale Stream Processing with Apache Kafka" at QCon New York 2016, Neha Narkhede introduces Kafka Streams, a new feature of Kafka for processing streaming data. According to Narkhede stream processing has become popular because unbounded datasets can be found in many places. It is no longer a niche problem like, for example, machine learning.

LinkedIn Details Production Kafka Debugging and Best Practices

by on  Jun 16, 2016

LinkedIn’s Joel Koshy details their Kafka usage, debugging and monitoring two production incidents in using the core Kafka infrastructure concepts, semantics and behavioral patterns to plan for and detect similar problems in the future.

LinkedIn Details Open-Sourced Kafka Monitor

by on  Jun 08, 2016

LinkedIn recently detailed open-sourced Kafka Monitor service that they're using to monitor production Kafka clusters as well as extensive testing automation, leading them to identify bugs in the main Kafka trunk and contribute solutions to the open-source community.

Confluent Platform 3.0 Supports Kafka Streams for Real-Time Data Processing

by on  Jun 03, 2016 2

Confluent Platform 3.0 messaging system from Confluent, the company behind Apache Kafka messaging framework, supports Kafka Streams for real-time data processing. The company announced last week the general availability of the latest version of the open source Confluent platform.

Cloudera Announces Partnership with the Broad Institute

by on  Jun 02, 2016

Cloudera announced their partnership with MIT & Harvard's Broad Institute and detailed some of their experience with the Genome Analytics Toolkit pipeline.

Apache Spark 2.0 Technical Preview

by on  May 31, 2016

Two years after the first release of Apache Spark, Databricks announced the technical preview of Apache Spark 2.0 , based on upstream branch 2.0.0-preview. The preview is not ready for production, neither in terms of stability nor API, but is a release intended to gather feedback from the community ahead of the general availability of the release.

Amazon Releases Kinesis Service Update

by on  May 23, 2016

Amazon has recently announced an update to their Amazon Kinesis Service. In this update, three new features have been added to Amazon Kinesis Streams and Amazon Kinesis Firehose including support for Elasticsearch Service Integration, Shard-Level Metrics and Time-Based Iterators.

Precision Medicine Modeling Demonstration with Spark on EMR, ADAM, and the 1000 Genomes Project

by on  May 19, 2016

AWS engineers Christopher Crosbie and Ujjwal Ratan detail using Spark on EMR for precision medicine data analysis on the ADAM platform with data from the 1000 genomes project.

The Broad Institute Migrates Genome Sequencing Pipeline to Google Cloud Platform

by on  May 13, 2016

Genomic data sequencing and subsequent analysis faces large data volume challenges that several organizations are solving with cloud services. The Broad Institute detailed their experience with petabyte scale sequencing pipelines last month through the Google Research Blog and is detailed here by InfoQ.

Deep Mind Discloses Details to InfoQ about NHS Partnership amid Reports of Vast Patient Data Access

by on  May 05, 2016 1

After months of awaiting details about the NHS and Google DeepMind partnership InfoQ gains insights into recent claims of widespread patient data access.

Elephant in the Cloud - Hadoop as a Service

by on  May 02, 2016 2

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.

AirFlow Joins Apache Incubator

by on  Apr 29, 2016

AirFlow recently joined the Apache Incubator program. AirFlow is a workflow and scheduling system designed to manage data pipelines. Developed by AirBnb for their internal usage, it was open sourced last September, as previously reported by InfoQ.

General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.