BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Older Newer rss
50:48

Big Data Infrastructure @ LinkedIn

Posted by Shirshanka Das  on  Apr 02, 2017 Posted by Shirshanka Das  on  Apr 02, 2017

Shirshanka Das describes LinkedIn’s Big Data Infrastructure and its evolution through the years, including details on the motivation and architecture of Gobblin, Pinot and WhereHows.

46:03

Scaling up Near Real-Time Analytics @Uber &LinkedIn

Posted by Chinmay Soman  on  Mar 30, 2017 Posted by Chinmay Soman Yi Pan  on  Mar 30, 2017

Chinmay Soman and Yi Pan discuss how Uber and LinkedIn use Apache Samza, Calcite and Pinot along with the analytics platform AthenaX to transform data to make it available for querying in minutes.

47:03

Real-Time Recommendations Using Spark Streaming

Posted by Elliot Chow  on  Mar 30, 2017 Posted by Elliot Chow  on  Mar 30, 2017

Elliot Chow discusses the data pipeline that they built with Kafka, Spark Streaming, and Cassandra to process Netflix user activities in real time for the Trending Now row.

47:47

Stream Processing & Analytics with Flink @Uber

Posted by Danny Yuan  on  Mar 25, 2017 Posted by Danny Yuan  on  Mar 25, 2017

Danny Yuan discusses how Uber builds its next generation of stream processing system to support real-time analytics as well as complex event processing.

39:21

Demistifying DynamoDB Streams

Posted by Akshat Vig  on  Mar 25, 2017 Posted by Akshat Vig Khawaja Shams  on  Mar 25, 2017

Akshat Vig and Khawaja Shams discuss DynamoDB Streams and what it takes to build an ordered, highly available, durable, performant, and scalable replicated log stream.

49:06

Building a Data Science Capability from Scratch

Posted by Victor Hu  on  Mar 23, 2017 Posted by Victor Hu  on  Mar 23, 2017

Victor Hu covers the challenges, both technical and cultural, of building a data science team and capability in a large, global company.

49:31

Data Cleansing and Understanding Best Practices

Posted by Casey Stella  on  Mar 23, 2017 Posted by Casey Stella  on  Mar 23, 2017

Casey Stella talks about discovering missing values, values with skewed distributions and likely errors within data, as well as a novel approach to finding data interconnectedness.

44:45

SQL Server on Linux: Will it Perform or Not?

Posted by Slava Oks  on  Mar 22, 2017 Posted by Slava Oks  on  Mar 22, 2017

Slava Oks talks about SQL Server’s history, high-level architecture and dives into core of I/O Manager, Memory Manager, and Scheduler. Topics include lessons learned and experiences behind the scenes.

51:31

Practical Data Synchronization Using CRDTs

Posted by Dmitry Ivanov  on  Mar 10, 2017 Posted by Dmitry Ivanov  on  Mar 10, 2017

Dmitry Ivanov discusses the basic CRDTs implementations in Scala, explaining the advantages of these data structures to solve many synchronization problems as well as their limitations.

54:36

ScyllaDB: Achieving No-Compromise Performance

Posted by Avi Kivity  on  Mar 07, 2017 1 Posted by Avi Kivity  on  Mar 07, 2017 1

Avi Kivity discusses ScyllaDB, the many necessary design decisions, from the programming language and programming model through low-level details and up to the advanced cache design, and more.

40:48

Data Science in the Cloud @StitchFix

Posted by Stefan Krawczyk  on  Feb 17, 2017 Posted by Stefan Krawczyk  on  Feb 17, 2017

Stefan Krawczyk discusses how StitchFix used the cloud to enable over 80 data scientists to be productive and have easy access, covering prototyping, algorithms used, keeping schema in sync, etc.

43:06

Elastic Data Analytics Platform @Datadog

Posted by Doug Daniels  on  Feb 17, 2017 1 Posted by Doug Daniels  on  Feb 17, 2017 1

Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack, pros and cons and the tooling built.

BT