BT
Data Science Follow 447 Followers

Apache Eagle, Originally from eBay, Graduates to top-level project

by Alexandre Rodrigues  Followers on  Jan 24, 2017

Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.

Cloud Follow 126 Followers

Improving Azure SQL Database Performance Using In-Memory Technologies

by Kent Weare Follow 7 Followers on  Jan 21, 2017 4

In late 2016, Microsoft announced the general availability of Azure SQL Database In-Memory technologies. In-Memory processing is only available in Azure Premium database tiers and provides performance improvements for On-line Analytical Processing (OLTP), Clustered Columnstore Indexes and Non-clustered Columnstore Indexes for Hybrid Transactional and Analytical Processing (HTAP) scenarios.

Data Science Follow 447 Followers

Mathieu Ripert on Instacart's Machine Learning Optimizations

by Alexandre Rodrigues  Followers on  Jan 05, 2017

Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.

Data Science Follow 447 Followers

Google BigQuery Adds New Public Datasets

by Alex Giamas Follow 3 Followers on  Jan 05, 2017

Stack Overflow recently announced making its dataset available through Google’s BigQuery. Using regular SQL statements, developers can query the full set of Stack Overflow data including posts, votes, tags, and badges. In this article we explore datasets that are available through Google's BigQuery platform.

Data Science Follow 447 Followers

Julien Nioche on StormCrawler, Open-Source Crawler Pipelines Backed by Apache Storm

by Alexandre Rodrigues  Followers on  Dec 15, 2016

Julien Nioche, director of DigitalPebble, PMC member and committer of the Apache Nutch web crawler project, talks about StormCrawler, a collection of reusable components to build distributed web crawlers based on the streaming framework Apache Storm. InfoQ interviewed Nioche, main contributor of the project, to find out more about StormCrawler and how it compares to other similar technologies.

Data Science Follow 447 Followers

Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing

by Srini Penchikala Follow 17 Followers on  Dec 09, 2016

A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.

Data Science Follow 447 Followers

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

by Alexandre Rodrigues  Followers on  Dec 08, 2016 1

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Data Science Follow 447 Followers

Couchbase 4.6 Developer Preview Released, Adds Real-Time Connectors for Apache Spark 2.0 and Kafka

by Alexandre Rodrigues  Followers on  Nov 28, 2016

Couchbase 4.6 Developer Preview features full text search improvements, cross data center replication with globally-ordered conflict resolution and connectors for real-time analytics technologies: one for Spark 2.0 and the other for Kafka.

Data Science Follow 447 Followers

Spark Summit EU Highlights: TensorFlow, Structured Streaming and GPU Hardware Acceleration

by Alexandre Rodrigues  Followers on  Nov 13, 2016

Apache Spark integration with deep learning library TensorFlow, online learning using Structured Streaming and GPU hardware acceleration were the highlights of Spark Summit EU 2016 held last week in Brussels.

Data Science Follow 447 Followers

Microsoft Releases Data Science Tools for Interactive Data Exploration and Modeling

by Srini Penchikala Follow 17 Followers on  Nov 07, 2016

Microsoft recently released two new data science tools for interactive data exploration: modeling and reporting. These tools can be reused by data science teams with data specific tasks in their projects. The goal is to ensure consistency and completeness of data science tasks across different projects in the organization.

Data Science Follow 447 Followers

Microservices and Stream Processing Architecture at Zalando Using Apache Flink

by Srini Penchikala Follow 17 Followers on  Oct 31, 2016 1

Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.

Cloud Follow 126 Followers

Wolfram Wants to Deliver “Computation Everywhere” with New Private Cloud

by Richard Seroter Follow 3 Followers on  Oct 26, 2016

Wolfram, the software company behind computation-centric products like Mathematica and Wolfram|Alpha, shipped a new private cloud appliance targeting companies that want to centralize their computational efforts.

Data Science Follow 447 Followers

Stream Processing and Lambda Architecture Challenges

by Alexandre Rodrigues  Followers on  Oct 19, 2016 4

Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.

Data Science Follow 447 Followers

Jay Kreps on Distributed Stream Processing with Apache Kafka and Kafka Streams

by Srini Penchikala Follow 17 Followers on  Oct 16, 2016

Apache Kafka and Kafka Streams frameworks help with developing stream-centric architectures and distributed stream processing applications. Jay Kreps, CEO of Confluent, gave the keynote presentation on stream processing and microservices at Reactive Summit 2016 Conference last week.

Data Science Follow 447 Followers

Reactive Summit 2016 Conference: Reactive Microservices and Staging Data Pipelines

by Srini Penchikala Follow 17 Followers on  Oct 08, 2016

Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT