BT

Intel Open-Sources BigDL, Distributed Deep Learning Library for Apache Spark

by Alexandre Rodrigues on  Jan 13, 2017

Intel open-sources BigDL, a distributed deep learning library that runs on Apache Spark. It leverages existing Spark clusters to run deep learning computations and simplifies the data loading from big datasets stored in Hadoop.

Multiple DNS Providers to Mitigate DDoS Attacks

by Hrishikesh Barua on  Jan 07, 2017

Distributed Denial of Service (DDoS) attacks against Domain Name System (DNS) providers are increasing in number and scale with the proliferation of insecure IoT devices. While DNS providers have various methods of protecting themselves against such attacks, one of the ways for a website to protect itself is to use multiple DNS providers.

Mathieu Ripert on Instacart's Machine Learning Optimizations

by Alexandre Rodrigues on  Jan 05, 2017

Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.

Google BigQuery Adds New Public Datasets

by Alex Giamas on  Jan 05, 2017

Stack Overflow recently announced making its dataset available through Google’s BigQuery. Using regular SQL statements, developers can query the full set of Stack Overflow data including posts, votes, tags, and badges. In this article we explore datasets that are available through Google's BigQuery platform.

Neo4j 3.1 Supports Causal Clustering and Security Enhancements

by Srini Penchikala on  Dec 31, 2016

The latest version of Graph NoSQL database Neo4j introduces causal clustering and new security architecture. Neo4j team recently released version 3.1 of the graph database. Other new features include database kernel improvements and a Schema Viewer.

Netflix Conductor, an Orchestration Engine for Microservices

by Abel Avram on  Dec 20, 2016 2

Netflix has developed an orchestration engine called “Conductor”, and has used it internally in production for the last year . During this time they executed some 2.6 million process workflows, starting with linear ones and ending with dynamic ones running over multiple days. Now they have open sourced Conductor, making it available to all those interested in workflow orchestration.

Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm

by Alexandre Rodrigues on  Dec 15, 2016

Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.

Q&A with Drew Koszewnik on a Disseminated Cache, Netflix Hollow

by Rags Srinivas on  Dec 14, 2016

Drew Koszewnik of Netflix talks to Rags Srinivas about a disseminated cache called Hollow.

Google Pushing for HTTPS

by Manuel Pais on  Dec 11, 2016

Google wants to push for HTTPS everywhere with a combination of deprecating existing Chrome features in non-secure sites, as well as new features only supported in HTTPS.

Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing

by Srini Penchikala on  Dec 09, 2016

A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

by Alexandre Rodrigues on  Dec 08, 2016 1

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka

by Alexandre Rodrigues on  Nov 28, 2016

Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.

Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration

by Alexandre Rodrigues on  Nov 13, 2016

Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.

Microsoft Releases Data Science Tools for Interactive Data Exploration and Modeling

by Srini Penchikala on  Nov 07, 2016

Microsoft recently released two new data science tools for interactive data exploration: modeling and reporting. These tools can be reused by data science teams with data specific tasks in their projects. The goal is to ensure consistency and completeness of data science tasks across different projects in the organization.

Microservices and Stream Processing Architecture at Zalando Using Apache Flink

by Srini Penchikala on  Oct 31, 2016 1

Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.

General Feedback
Bugs
Advertising
Editorial
Marketing
InfoQ.com and all content copyright © 2006-2016 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.