BT

Your opinion matters! Please fill in the InfoQ Survey!

Data Science Follow 329 Followers

Confluent Releases KSQL, a Distributed Streaming SQL Engine for Apache Kafka

by Srini Penchikala Follow 15 Followers on  Oct 25, 2017

Confluent released KSQL: interactive, distributed streaming SQL engine for Apache Kafka. KSQL supports stream processing operations like aggregations, joins, windowing, and sessionization on topics in Apache Kafka. Confluent announced the open source streaming SQL engine at the recent Kafka Summit conference.

Cloud Follow 79 Followers

Microsoft Updates AI Services and Tools for Data Scientists and Developers

by Kent Weare Follow 7 Followers on  Sep 30, 2017

At the recent Ignite conference, Microsoft released several updates related to its Artificial Intelligence (AI) services and tools. These updates include the release of the Azure ML Experimentation service, Azure ML Model Management service, Azure ML Workbench and the general availability of Microsoft Cognitive Services.

Data Science Follow 329 Followers

Q&A with Andrew Brust of Datameer Regarding Big Data's Role in AI

by Rags Srinivas Follow 2 Followers on  Jul 31, 2017

Rags Srinivas talks to Datameer's Andrew Brust about the larger role of Big Data in AI and how it's operationalized with SmartAI.

Cloud Follow 79 Followers

Microsoft Updates Azure IoT Platform: Adds Connectivity, Time Series Insights and Edge Analytics

by Kent Weare Follow 7 Followers on  Apr 29, 2017

Microsoft has recently made some announcements regarding their Internet of Things (IoT) capabilities within Azure. Microsoft’s news includes adding a new service called Azure Time Series Insights, additional connectivity platform support for OPC UA/DA and Azure Stream Analytic support on edge devices. In addition, Microsoft also announced a new SaaS-based IoT Solution called Azure IoT Central.

Data Science Follow 329 Followers

Data Preparation Pipelines: Strategy, Options and Tools

by Srini Penchikala Follow 15 Followers on  Apr 16, 2017

Data preparation is an important aspect of data processing and analytics use cases. Business analysts and data scientists spend about 80% of their time gathering and preparing the data rather than analyzing it or developing machine learning models. Kelly Stirman spoke last week at Enterprise Data World 2017 Conference about the data preparation best practices.

DevOps Follow 294 Followers

How 3rd Party Tools Nearly Killed Performance (and Culture) at Adidas

by Manuel Pais Follow 6 Followers on  Jan 20, 2017

How the shoe and clothes giant manufacturer's IT tamed an out-of-control proliferation of third party tools in their global websites which was killing performance. Furthermore, this led to a blame culture setting in between business and IT. A new third party governance process focusing on performance data and user experience validation was key to stop the bleeding.

Data Science Follow 329 Followers

Mathieu Ripert on Instacart's Machine Learning Optimizations

by Alexandre Rodrigues Follow 0 Followers on  Jan 05, 2017

Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.

Data Science Follow 329 Followers

AFK-MC² Algorithm Speeds up k-Means Clustering Algorithm Seeding

by Alexandre Rodrigues Follow 0 Followers on  Dec 23, 2016

“Fast and Probably Good Seedings for k-Means” by Olivier Bachem et al. was presented on 2016’s Neural Information Processing Systems (NIPS) conference and describes AFK-MC2, an alternative method to generate initial seedings for k-Means clustering algorithm that is several orders of magnitude faster than the state of art method k-Means++.

Data Science Follow 329 Followers

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

by Alexandre Rodrigues Follow 0 Followers on  Dec 08, 2016 1

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Data Science Follow 329 Followers

Microservices and Stream Processing Architecture at Zalando Using Apache Flink

by Srini Penchikala Follow 15 Followers on  Oct 31, 2016 1

Javier Lopez and Mihail Vieru spoke at Reactive Summit 2016 Conference about cloud-based data integration and distribution platform used for stream processing in business intelligence use cases. Their solution is based on technologies such as Flink, Kafka and Elasticsearch.

Data Science Follow 329 Followers

Stream Processing and Lambda Architecture Challenges

by Alexandre Rodrigues Follow 0 Followers on  Oct 19, 2016 4

Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.

Data Science Follow 329 Followers

Reactive Summit 2016 Conference: Reactive Microservices and Staging Data Pipelines

by Srini Penchikala Follow 15 Followers on  Oct 08, 2016

Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.

Data Science Follow 329 Followers

Data Streaming Architecture with Apache Flink

by Srini Penchikala Follow 15 Followers on  Jun 09, 2016

Jamie Grier recently spoke at OSCON 2016 Conference about data streaming architecture using Apache Flink. He talked about the building blocks of data streaming applications and stateful stream processing with code examples of Flink applications and monitoring.

Data Science Follow 329 Followers

Precision Medicine Modeling Demonstration with Spark on EMR, ADAM, and the 1000 Genomes Project

by Dylan Raithel Follow 4 Followers on  May 19, 2016

AWS engineers Christopher Crosbie and Ujjwal Ratan detail using Spark on EMR for precision medicine data analysis on the ADAM platform with data from the 1000 genomes project.

Data Science Follow 329 Followers

Elephant in the Cloud - Hadoop as a Service

by Srini Penchikala Follow 15 Followers on  May 02, 2016 2

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT