BT

Streaming: Danny Yuan on Real-Time, Time Series Forecasting @Uber

| Podcast with Danny Yuan Follow 3 Followers by Wesley Reisz Follow 17 Followers on Mar 31, 2018 |

On this week’s podcast, Danny Yuan, Uber’s Real-time Streaming/Forecasting Lead, lays out a thorough recipe book for building a real-time streaming platform with a major focus on forecasting. In this podcast, Danny discusses everything from the scale Uber operates at to what the major steps for training/deploy models in an iterative (almost Darwinistic) fashion and wraps with his advice for software engineers who want to begin applying machine learning into their day-to-day job.

Key Takeaways

  • Uber processes 850,000 - 1.3 million messages per second in their streaming platform with about 12 TB of growth per day. The system’s queries scan 100 million to 4 billion documents per second.
  • Uber’s frontend is mobile. The frontend talks to an API layer. All services generate events that are shuffled into Kafka. The real-time forecasting pipeline taps into Kafka to processes events and stores the data into Elasticsearch. * There is a federated query layer in front of Elasticsearch to provide OLAP query capabilities.
  • Apache Flink’s advanced windowing features, programming model, and checkpointing convinced Uber to move away from the simplicity of Apache Samza.
  • The forecasting system allows Uber to remove the notion of delay by using recent signals plus historical data to project what is happening now and what will happen into the future.
  • Uber’s pipeline for deploying ML models: HDFS, feature engineering, organizing into data structures (similar to data frames), deploy mostly offline training models, train models, & store into a container-based model manager. 
  • A model serving layer is used to pick which model to use, forecasting results are stored in an OLAP data store, a validation layer compares real results against forecast results to verify the model is working as desired, and a rollback feature enables poor performing models to be automatically replaced by previous one.
  • “Without output, you don’t have input.” If you want to start leveraging machine learning, developers just need to start doing. Start with intuition and practice. Over time ask questions and learn what you need, then apply a laser focus to gain that knowledge.

Sponsored by

Cloud is all about choice – shouldn’t your data platform follow suit?
Take full advantage of multi-cloud with Couchbase -- a geo-distributed cloud-native data platform built to power your business-critical applications in any in any public cloud or on-premises environment. Avoid cloud lock-in and open up a world of possibility – Start a Revolution with Couchbase. Learn more at couchbase.com/cloud.

About QCon

QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams. QCon takes place 7 times per year in London, New York, San Francisco, Sao Paolo, Beijing & Shanghai. QCon San Francisco is at its 12th Edition and will take place Nov 5-9, 2018. 140+ expert practitioner speakers, 1300+ attendees and 18 tracks will cover topics driving the evolution of software development today. Visit to qconsf.com get more details.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS feed, and they are available via SoundCloud and iTunes.  From this page you also have access to our recorded show notes.  They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Sponsored Content

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT