BT

Your opinion matters! Please fill in the InfoQ Survey!

Sachin Kulkarni Describes the Architecture behind Facebook Live

| Podcast with Sachin Kulkarni Follow 0 Followers by Wesley Reisz Follow 9 Followers on May 28, 2017 | NOTICE: The next QCon is in London Mar 5-9, 2018. Join us!

A note to our readers: As per your request we have developed a set of features that allow you to reduce the noise, while not losing sight of anything that is important. Get email and web notifications by choosing the topics you are interested in.

Wesley Reisz talks to Sachin Kulkarni, Director of Engineering at Facebook, about the engineering challenges of building Facebook Live, and how it compares to the video upload platform at Facebook.

Key Takeaways

  • The Facebook Infrastructure teams provide the platform that powers the board family of apps including the Facebook app itself, Messenger and Instagram. It is largely a C++ shop.
  • The video infra team at Facebook builds the video infrastructure across the whole company. Projects include a distributed video encoding platform which results in low latency video encoding, video upload and ingest.
  • Encoding for Facebook Live is done on both the client and the server. The trade-off between encoding on the client side and the server side is mostly around the quality of the video vs. latency and reliability.
  • Facebook gets around a 10x speed-up by encoding data in parallel when compared to serial processing.
  • They also have an AI-based encoding system which results in 20% smaller files than raw H.264.

Facebook Infrastructure

  • 1:48 - Facebook infra. powers the board family of apps including the Facebook app, Messenger and Instagram. The group is responsible for storage, caching, pub/sub, monitoring, streaming and so on.
  • 2:30 - The video infra team builds the video infrastructure across the whole company. Projects include a distributed video encoding platform which results in low latency video encoding, video upload and ingest. Ingesting is about moving the bytes from the client apps to the Facebook data centre, while encoding is about server side processing to reduce size while keeping the quality high.
  • 2:58 - Another angle is video clustering, where similar videos can be clustered together for better search ranking.

Facebook Live encoding

  • 3:35 - Facebook Live does encoding on both the client and the server.
  • 4:03 - The trade-off between encoding on the client-side and the server-side is mostly around video quality vs. latency and reliability. Since encoding is typically lossy if the network is good the Facebook app will try to keep the quality as high as possible and use little or no encoding. Conversely if the network is less good then more encoding is done on the phone to keep the amount of data to be transferred smaller.

Video at Facebook Scale

  • 4:55 - There are 1.28 billion daily users of Facebook. A good example of where this causes problems is with comparably rare situations such as race conditions because the normally rare cases will get hit more frequently with that volume of users. In consequence avoiding race conditions needs to be thought of at design time.
  • 6:15 - Facebook launches everything on an experimental basis starting with internal users. Then roll-out to the wider public is gradual - 0.1% of users, then 1%, 5% and so on to expose race conditions and other issues early.
  • 6:56 - For back-end systems the release is typically done weekly so the release goes from just a handful of users to 1.28 billion users in a week.

Facebook Live

  • 8:28 - Facebook Live is a live streaming platform open to all users. The latency depends a lot on where the broadcaster and viewer are, and the network conditions in both of those places. The aim though is to keep the latency in single digit seconds.
    [NOTE: Originally, Wes used the incorrect video latency units for streaming with Facebook Live. Latency should be measured in single digit seconds. The original recording will be edited to indicate the correction.]
  • 9:44 - The product started at a hackathon in 2015. A small team built a working prototype infrastructure in just a few days. The first thing they streamed was a clock to measure the latency of the system!
  • 11:30 - It took around 4 months to get from the prototype to launching Live to public profiles in August of 2015. By December 2015 the platform was scaled to all users on iOS, Android and browsers.
  • 12:04 - It was possible to build the product that quickly because a lot of infrastructure already existed - the Everstore BLOB storage system is solid, and they could also rely on open-source software like NGINX for doing the encoding and processing.
  • 13:14 - Facebook Infrastructure is largely a C++ shop. There is some Java and Python, and the business logic is all done in PHP. The iOS apps are written in Objective C and the Android apps are in Java.

Facebook Live Architecture

  • 13:54 - It all starts with the broadcast client - this could be an Android or iOS app, the Mentions app, or via the Live API. In the client app there are libraries which do packaging, encoding and so on.
  • 14:13 - The stream is sent via RTMPS (Real-Time Messaging Protocol) to a geographically local PoP. Then the connection is forwarded over an internal Facebook network to a Facebook data-centre.
  • 14:34 - At the data-centre the stream hits an encoding host which authenticates the stream, encodes it into multiple formats at different bitrates, and packages it into RTMP or DASH (Dynamic Adaptive Streaming over HTTP). The stream is then cached in a CDN before it hits the player.
  • 15:27 - Users can be broadcasting from anywhere in the world so the geographically local PoP reduces round-trip time.
  • 15:44 - A key thing for load balancing is hashing based on the stream ID. When you make a request to go live you get a stream ID and a URI. Facebook does a hash based on the stream ID and maps the stream to different data centres based on the hash.
  • 16:13 - The client libraries run a speed test on iOS and Android to figure out the video bitrate to be used for encoding. They then encode the uncompressed bitstreams from the phone using the H.264 and AAC codecs for video and audio before warping the compressed frames in an RTMP compatible format and sending the packets to the server.
  • 17:03 - Network bandwidth is not a static thing and can change during the broadcast. Facebook Live uses Adaptive Bitrate to cope with this.
  • 19:57 - To stream the live stream out they use MPEG-DASH, an adaptive bitrate streaming format that enables streaming data over HTTP. It comprises a manifest file, essentially an index which points to media files, and individual media files, for example for each second of the live stream.
  • 20:43 - When you see a live stream in your feed and you click on it the player requests the manifest. If it isn't already on your local PoP the request goes to the data centre to get the manifest, and then fetches the media files. As they get sent back they are cached on the PoP if they aren’t there already.

Video Upload

  • 24:10 - One of the key challenges with Live is that it has to happen in real-time. For video upload you can batch the workload and hence latency is less key.
  • 25:08 - They key requirements for building a stable and scalable video encoding platform is that it needs to be fast, flexible, able to cope with spikes, and very efficient.
  • 25:28 - At a high level the client library takes the video and breaks it up into smaller chunks corresponding to GOPs (Groups Of Pictures) roughly equivalent to a scene in a video which is sent to the server. On the server-side a pre-processor receives the chunks and writes them to a cache and then starts encoding them in parallel as the chunks arrive. Facebook gets around 10x speed-up by encoding in parallel compared to doing this serially.

AI Encoding

  • 26:25 - Facebook also tries to be efficient using bitrates since this affects user’s data plans. Creating smaller files without using an exorbitant amount of CPU is a hard problem because modern encoders have so many combinations of encoding settings.
  • 27:17 - The key insight is that not only is each video is different, but also that even within a video the encoding settings could be different for each scene.
  • 27:53 - Facebook uses AI to optimise the encoding settings. A training set is used to train a neural net model which can then come up with the right settings for each scene. The disturbed encoding system naturally lends itself to this kind of approach.
  • 28:27 - AI encoding resulted in 20% smaller files than H.264.

New Live Features

  • 28:55 - Facebook Live now allows people watching the stream to be invited to join it directly and ask questions during the stream. This requires very low latency - in the order of 100s of Milliseconds.
  • 29:49 - This is done using a different protocol - WebRTC - which is typically used for video calling.

Resources

About QCon

QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams. QCon takes place 7 times per year in London, New York, San Francisco, Sao Paolo, Beijing & Shanghai. QCon London is at its 12th Edition and will take place Mar 5-9, 2018. 100+ expert practitioner speakers, 1300+ attendees and 18 tracks will cover topics driving the evolution of software development today. Visit qconlondon.com to get more details.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS feed, and they are available via SoundCloud and iTunes.  From this page you also have access to our recorded show notes.  They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

A complete joke by Tim McClure

Hey Sachin - so you are targeting single milliseconds latency - wow talking about defeating physics - you should be completely embarrassed - you did not correct the clueless interviewer - chunk-base protocol lucky to be under 10 seconds although in most of your use cases it does not matter - really appreciate your honesty -

Re: A complete joke by Wesley Reisz

@Tim thank you for pointing this out. I obviously misspoke. We're updating both the podcast and the text to indicate latency measured in seconds.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Sponsored Content

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT