BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Streaming a Million Likes/Second: Real-Time Interactions on Live Video

Streaming a Million Likes/Second: Real-Time Interactions on Live Video

Bookmarks
49:35

Summary

Akhilesh Gupta does a technical deep-dive into how Linkedin uses the Play/Akka Framework and a scalable distributed system to enable live interactions like likes/comments at massive scale at extremely low costs across multiple data centers.

Bio

Akhilesh Gupta is the technical lead for LinkedIn's Real-time delivery infrastructure and LinkedIn Messaging. He has been working on the revamp of LinkedIn’s offerings to instant, real-time experiences. Before this, he was the head of engineering for the Ride Experience program at Uber Technologies in San Francisco.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Gupta: Anyone know what the largest live stream in the world was? Think of something that grinds an entire nation to halt.

Participant 1: Royal wedding.

Gupta: That was the second largest.

Participant 2: Cricket match.

Gupta: Yes, cricket. Believe it or not, it was the semifinal of the Cricket World Cup last year between India and New Zealand. More than 25 million viewers watched the match at the same time. Also, overall the match crossed 100 million viewers. As this gentleman said, the second largest was the British royal wedding, which was more than 80 million viewers concurrently. Remember this, we'll come back to this.

This is me and my team. We call ourselves the Unreal Real-Time Team, and we love cricket and we love coding. We believe in something. We believe that problems in distributed systems can be solved by starting small. Solve the first problem, and add simple layers in your architecture to solve bigger problems.

Today I'm going to tell you a story of how we built a platform called the Realtime Platform using that principle to solve how we can stream or have many people interact simultaneously on live videos. I hope that a lot of you will be able to learn something from this and also apply it to your own systems.

Real-Time Interactions on Live Video?

What is a live video? A live telecast, or live conference broadcast, a sports match, all of them are examples of live videos. How many of you have interacted with a live stream on YouTube or Facebook? Perfect, so you know the general idea. The difference that these streams have is that they allow viewers to interact with each other. What is real-time interaction on live video? The easiest way is to just look at it or try it on your phones with the demo. People are able to like, comment, and just ask questions, and interact with each other. This is an example of a LinkedIn iOS app doing a LinkedIn live video. Similarly, on desktop the experience is very similar. There are many rich interactions that happen there. This particular live stream is from the LinkedIn Talent Connect conference that happened in September of last year.

I want to talk about the simplest interaction here, which is, how do these likes get distributed to all these viewers in real time all at the same time? Let's say that a sender S likes the video and wants to send it to receiver A. The sender S sends the like to the server with a simple HTTP request. How do we send the like from the server back to receiver A? Publishing data to clients is not that straightforward. This brings us to the first challenge. What is the delivery pipe to send stuff to the clients? By the way, this is how I've structured the presentation today. I'm going to introduce problems and I'm going to talk about how we solve them in our platform. These are all simple layers of architecture that we've added to a platform to solve each of them.

Challenge 1: The Delivery Pipe

As discussed before, sender S sends the like to what we call the likes backend, which is the backend that stores all these likes with a simple HTTP request. This backend system now needs to publish the like over to the real-time delivery system. Again, that happens with a simple HTTP request. The thing we need is a persistent connection between the real-time delivery system and receiver A. Let's talk a little bit more about the nature of this connection, because that's the one that we care about here.

Log in at linkedin.com, and go to this URL, tiny.cc/realtime. This time you will actually see what's happening behind the scenes. You should be logged in for this. For those of you who were successful, you should see something like this on the screen. This is literally your persistent connection with LinkedIn. This is the pipe that we have to send data over to your phones or to your laptops. This thing is extremely simple. It's a simple HTTP long poll, which is a regular HTTP connection where the server holds onto the request. It just doesn't disconnect it.

Over this connection, the user technology called server-sent events, and this allows us to stream chunks of data over what is called the EventSource interface. The client doesn't need to make subsequent requests. We can just keep streaming data on the same open connection. The client makes a normal HTTP GET request. It's as simple as a regular HTTP connection request. The only difference is that the Accept header says event-stream. That's the only difference from a regular HTTP connection request.

The server responds with a normal HTTP 200 OK and sets the content type to event-stream, and the connection is not disconnected. Chunks of data are sent down without closing the connection. You might receive, for example, a "like" object, and later you might receive a "comment" object. Without closing the connection, the server is just streaming chunks of data over the same open HTTP connection request. Each chunk is processed independently on the client through what is called the EventSource interface, and as you can see, there is nothing terribly different from a regular HTTP connection, except that the Content-Type is different, and you can stream multiple chunks of bodies on the same open HTTP request.

How does this look like on the client side on web? The client first creates an EventSource object with a target URL on the server. Then, it defines these event handlers, which will process each chunk of data independently on the client. Most browsers support the EventSource interface natively. On Android and iOS there are lightweight libraries that are available to implement the EventSource interface on these clients.

Challenge 2: Connection Management

We now know how to stream data from the server to the client, and we did this by using HTTP long poll with server-sent events. What is the next challenge? Think of all the thousands of Indians trying to watch cricket. The next challenge is multiple connections, maybe thousands of them, and we need to figure out how to manage these connections.

Connection management – At LinkedIn, we manage these connections using Akka. Akka is a toolkit for building highly confident, message-driven applications. Anyone familiar with Akka Actors? A roomful. Yes, not Hollywood actors, they're this very simple thing. It's this little, small guy. This is the only concept you need to know to understand the rest of the presentation. Akka Actors are objects which have some state, and they have some behavior. The behavior defines how the state should be modified when they receive certain messages. Each actor has a mailbox, and they communicate exclusively by exchanging messages.

An actor is assigned a lightweight thread every time there is a message to be processed. That thread will look at the behavior that is defined for the message and modify the state of the Akka Actor based on that definition. Then, once that is done this thread is actually free to be assigned to the next actor. Since actors are so lightweight, there can be millions of them in the system, and each can have their own state and their own behavior. A relatively small number of threats, which is proportionate to the number of cores, can be serving these millions of actors all on the same time, because a thread is assigned to an actor only when there is something to process.

In our case, each actor is managing one persistent connection, that's the state that it is managing. As it receives an event, the behavior here is defining how to publish that event to the EventSource connection. Those many connections can be managed by the same machine using this concept of Akka Actors. Let's look at how Akka Actors are assigned to an EventSource connetion. Almost every major server frame will support the EventSource interface natively. At LinkedIn we use the Play Framework, and if you're familiar with Play, we just use a regular Play controller to accept the incoming connection.

Then, we use the Play EventSource API to convert it into a persistent connection, and assign it a random connectionId. Now we need something to manage the lifecycle of these connections, and this is where Akka Actors fit in. This is where we create an Akka Actor to manage this connection, and we instantiate an Akka Actor with the connectionId, and the handle to the EventSource connection that it is supposed to manage. Let's get [inaudible 00:11:35] and see how the concept of Akka Actors allows you to manage multiple connections at the same time.

Each client connection here is managed by its own Akka Actor, and each Akka actor in turn, all of them, are managed by an Akka supervisor actor. Let's see how a like can be distributed to all these clients using this concept. The likes backend publishes the like object to the supervisor Akka Actor over a regular HTTP request. The supervisor Akka Actor simply broadcasts the like object to all of its child Akka Actors here. Then, these Akka Actors have a very simple thing to do. They just need to take the handle of the EventSource connection that they have and send the event down through that connection. For that, it looks something very simple. It's eventSource.send, and the like object that they need to send. They will use that to send the like objects down to the clients. What does this look like on the client side? The client sees a new chunk of data, as you saw before, and will simply use that to render the like on the screen. It's as simple as that.

In this section we saw how relevant source connection can be managed using Akka Actors, and therefore, you can manage many connections on a single machine. What's the next challenge? Participant 3: Fan-out. Gupta: Fan-out is one. Before that.

Participant 4: [inaudible 00:13:19]

Gupta: Mailbox [inaudible 00:13:22]. We're now already talking about big, big scale. Even before that, something simple. I'll give you a hint. My wife and I always want to watch different shows on Netflix.

Participant 5: [inaudible 00:13:36]

Gupta: Yes. The thing that we did just now is just broadcast the like blindly to everybody without knowing which particular library they're currently actually watching.

Challenge 3: Multiple Live Videos

We don't know how to make sure that a like for, let's say, the red live video goes to the red client, and the green live video goes to the green client. Let's assume that this client here with connection id3 is watching the red live video, and this client here with connection id5 is watching the green live video. What we need is a concept of subscription, so the client can inform the server that this is the particular live video that they're currently watching.

When client 3 starts watching the red live video, all it does is it sends a simple subscription request using a simple HTTP request to our server. The server will store the subscription in an in-memory subscriptions table. Now the server knows that the client with connection id3 is watching the red live video. W hy does in-memory work? There are two reasons. The subscription table is completely local. It is only for the clients that are connected to this machine.

Secondly, the connections are strongly tied to the lifecycle of this machine. If the machine dies, the connection is also lost, and therefore, you can actually store these subscriptions in-memory inside these frontend nodes. We'll talk a little bit more about this later.

Similarly, client 5 also subscribes to live video 2, which is the green live video. Once all the subscriptions are done, this is the state of the front end of the real-time delivery system. The server knows which clients are watching which live videos.

When the backend publishes a like for the green live video this time, all that the supervisor actor has to do is figure out which are all the clients that are subscribed to the green live video, which in this case is clients 1, 2, and 5. The corresponding Akka Actors are able to send the likes to just those client devices. Similarly, when a like happens on the red live video these these actors are able to decide that it is designed only for connection ids 3 and 4, and is able to send them the likes for the videos that they're currently watching.

In this section we introduce the concept of subscription, and now we know how to make sure that clients are only receiving likes for the videos that they're currently watching. What's the next challenge? Now we can go back to the gentleman here. Somebody already said here that there could be millions and millions of connections. There are just more number of connections than what a single machine can handle. That's the next challenge.

Challenge 4: 10K Concurrent Viewers

We thought really hard about this. This is where we were a little stuck, and that's us thinking really hard. We finally did what every backend engineer does to solve scaling challenges. You already know. We added a machine. We add a machine and we start calling these frontend servers. We introduce a real-time dispatcher whose job is to dispatch a published event between the newly introduced frontend machines, because now we have more than one.

Now, can the dispatcher node simply send a published event to all the frontend nodes? Yes, it can. It's not that hard. It can, but it turns out that it's not really efficient if you have a small live video with only a few viewers that are connected to just a few frontend machines. There's a second reason which I'll come back to a little later, but for now, let's assume that the dispatcher can't simply send a like to all the frontend machines blindly.

Given that the dispatcher now needs to know which frontend machine has connections that are subscribed to a particular live video. We need these frontend machines to tell the dispatcher whether it has connections that are subscribed to a particular live video. Let's assume that frontend node1 here has connections that are subscribed to the red live video, and frontend node 2 here has connections that are subscribed to both the red and the green live video. Frontend node1 would then send a simple subscription request, just like the clients were sending to the frontend servers, and tell the real-time dispatcher that it has connections that are watching the red live video. The dispatcher will create an entry in its own subscriptions table to figure out which frontend nodes are subscribed to which live videos. Similarly, node2 here subscribes to both the red live video and the green live video.

Let's look at what happens when an event is published. After a few subscriptions, let's assume that this is the state of the subscriptions in the real-time dispatcher, and note that a single frontend node could be subscribed to more than one live videos. Now it can have connections that are watching multiple live videos at the same time. In this case, for example, node2 is subscribed to both the red live video and the green live video.

This time the likes backend publishes a like on the green live video to the real-time dispatcher, and the dispatcher is able to look up its local subscriptions table to know that nodes 2, 3, and 5 have connections that are subscribed to the green live video. It will dispatch them to those frontend nodes over a regular HTTP request. What happens next? That you've already seen. These frontend nodes will look up their own in-memory subscriptions table that is inside them to figure out which of their connections are watching the green live video and dispatch the likes to just those ones.

We now have this beautiful system where the system was able to dispatch between multiple frontend nodes, which are then able to dispatch to many, many clients that are connected to them. We can scale to almost any number of connections, but what is the bottleneck in the system? The dispatcher is the bottleneck in the system. It never ends. The next challenge is that we have this one node, which is what we're calling the dispatcher, and if it gets a very high published rate of events then it may not be able to cope up.

Challenge 5: 100 Likes/Second

That takes us to challenge number 5, which is a very high rate of likes being published per second. Once again, how do we solve scaling challenges? You add a machine. Engineers just do the most lazy thing and it usually works out pretty well. We add another dispatcher node to handle the high rate of likes being published. Something about it to note here, the dispatcher nodes are completely independent of the frontend nodes. Any frontend node can subscribe to any dispatcher node, and any dispatcher node can publish to any frontend node. There is no persistent connections here. The persistent connections are only between frontend nodes and the clients, not here.

This results in another challenge, the subscriptions table can no longer be local to just one dispatcher load. Any dispatcher node should be able to access that subscriptions table to figure out which frontend node a particular published event is destined for. Secondly, I tricked you a little bit before. This subscriptions table can't really live in-memory in the dispatcher node. It can live in-memory in the frontend node, but not in the dispatcher node. Why? Because even if a dispatcher node is lost, let's say this one just dies, then we can't afford to lose this entire subscriptions data. For both of these reasons we pull out their subscriptions table into its own key value store which is accessible by any dispatcher node at any time.

Now, when a like is published by the likes backend for the red live video on a random dispatcher node, and the green live video on some other random dispatcher node, each of them are able to independently query the subscriptions table that is residing in the key value store. They're able to do that because the subscriptions table is completely independent of these dispatcher nodes, and the data is safe there. Our dispatcher nodes dispatch the likes based on what is in the subscriptions table, or with regular HTTP requests to the frontend nodes.

Challenge 6: 100 Likes/S, 10K Viewers Distribution of 1M Likes/S

I think we now have all the components to show you how we can do what I promised in the title of this talk. If 100 likes are published per second by the likes backend to the dispatcher, and there are 10k viewers that are watching the live video at the same time, then we're effectively distributing a million likes per second. I'm going to start from the beginning and show you everything in one flow, because everyone tells me that I've got to repeat myself if I'm going to make sure that you remember something when you walk out of this talk.

This is how a viewer starts to watch a live video, and at this time the first thing that the viewer needs to do is subscribe to the frontend node, and subscribe to the library or topic that they're currently watching. The client sends a subscription request to the frontend node, and the frontend node stores the subscription in the in-memory subscriptions table. The same happens for all said subscriptions from all the clients. Let's go back [inaudible 00:24:35].

Now the subscription has reached the frontend nodes. The frontend node, as I said before, now has to subscribe to the dispatcher nodes, because the dispatcher will lead the node during the published step which frontend nodes have connections that are subscribed to a particular live video, so let's look at that flow. The frontend node sends a subscription request to the dispatcher, which creates an entry in the key value store that is accessible by any dispatcher node. In this case, node1 has subscribed to live video 1, and node2 is subscribing to live video 2. This is the end of the subscriptions flow, so now we need to look at what happens during the published flow.

The published flow starts when a viewer starts to actually like a live video, so different viewers are watching different live videos, and they're continuously liking them. All these requests are sent over regular HTTP requests to the likes backend, which stores them and then dispatches them to the dispatcher.

It does so with a regular HTTP request to any random dispatcher node, and they look up the subscriptions table to figure out which frontend nodes are subscribed to those likes and dispatch them to the subscribed frontend nodes. The likes have now reached the frontend nodes, and we have the last step which we began the presentation with. They need to send it to the right client devices. Each frontend node will look up its local subscriptions table, and this is done by the supervisor Akka Actor to figure out which Akka Actors to send these like objects to. They will dispatch the likes to the appropriate connections based on what they see in the subscriptions table.

Done. We just distributed a million likes per second with a fairly straightforward and iteratively designed, scalable distributed system. This is the system that we call the Real-Time Platform at LinkedIn. By the way, it doesn't just distribute likes. It can also do comments, typing indicators, seen receipts, all of our instant messaging works on this platform, and even presence. Those green online indicators that you see on LinkedIn are all driven by this system in Real-Time. Everything is great. We're really happy, and then, LinkedIn adds another data center.

Bonus Challenge: Another Data Center

This made us really stressed. We don't know what to do, so we went back to our principle. We said, "Ok, how can we use our principle to make sure that we can use our existing architecture and make it work with multiple data centers?" Let's look at that. Let's take the scenario where a like is published to a red live video in the first data center, so this is DC-1. Let's just assume that this is the first data center. Let's also assume that there are no viewers of the red live video in the first data center. Remember I spoke about subscriptions in the dispatcher? It helps here, because now we might prevent a lot of work in DC-1 because we know whether we have any subscriptions for the red live video in DC-1.

We also know that in this case there are no viewers for the red live video in DC-2, but there are viewers of the red live video in DC-3. Somehow we need to take this like and send it to this guy over here, really far away. Let's start. The likes backend gets the like for the red live video from the viewer in DC-1, and it does exactly what it was doing before. It's not the likes backend's responsibility, it's the platform's responsibility. We are building a platform here, and therefore, hiding all the complexity of the multiple data centers from the users that are trying to use this platform. It will just publish the like to the dispatcher in the first data center just like it was doing before. Nothing changes there.

Now that the dispatcher in the first data center has received the like, the dispatcher will check for any subscriptions, again, just like before, in its local data center. This time it saved a ton of work because there are no viewers of the red live video in DC-1. How do we get the like across to all the viewers in the other data centers. That's the challenge. Any guesses?

Participant 6: Add another dispatcher.

Gupta: No, don't add another dispatcher. We already have too many dispatchers.

Participant 7: [inaudible 00:29:47]

Gupta: Ok, so we can do cross-colo subscriptions, cross data center subscriptions. What's another idea?

Participant 8: You can broadcast to any DC.

Gupta: Good, broadcast to any DC. We'll talk a little bit about the tradeoff between subscribing in a cross data center fashion versus publishing in a cross data center fashion. It turns out that publishing in a cross data center fashion is better here, and we'll talk a little bit about that a little later. Yes, this is where we do a cross colo, or a cross data center publish to dispatchers in all of the peer nodes. We're doing that so that we can capture viewers that are subscribed to the red live video in all the other data centers.

The dispatcher in the first data center simply dispatches the likes to all of its peer dispatchers in all the other data centers, and in this case, a subscriber is found in DC-3 but not in DC-2. By the way, this dispatcher is doing exactly what it would've done if it received this like locally in this data center. There's nothing special that it is doing. It's just that this dispatcher distributed the like all over to all the dispatchers in the peer data centers. The viewer in DC-3 simply gets the like just like it would normally do, because the dispatcher was able to find the subscription information in DC-3. This viewer with the green live video does not get anything.

This is how the platform can support multiple data centers across the globe by keeping subscriptions local to the data center, while doing a cross colo fan-out during publish.

Performance & Scale

Finally, I want to talk a little bit about the performance of the system. It looks like everybody is here because, hey, scale. We did this experiment where we kept adding more and more connections to the same frontend machine. We just kept on going, and wanted to figure out how many persistent connections a single machine can hold. Any guesses?

Participant 9: A million.

Gupta: No, not that many. We also are doing a lot of work. It turns out that we were able to have 100,000 connections on the same machine. Yes, you can go to a million, but at the same time, because we're also doing all this work, and because we use the system not just for distributing likes but also for all the other things that LinkedIn has, we were able to get to 100,000 connections per frontend machine. Anyone remember the second largest live stream?

Participant 10: Royal wedding.

Gupta: The royal wedding had 18 million viewers at peak, so we could do that with just 180 machines. A single machine can do 100,000 connections, and so with 180 machines you're able to have persistent connections for all the 18 million viewers that are currently streaming the royal wedding. Of course, we just didn't get to this number easily, so we hit a bunch of file descriptor limits, port exhaustion, even memory limits. Luckily we documented all of that at this link, tiny.cc/linkedinscaling. I hope that you will be able to get something out of reading something like this, because it's very interesting. It's just like regular scaling challenges, it's just that we hit it in context of trying to expand the number of connections that we could hold on a single machine.

How about other parts of the system? How many events per second can be published to the dispatcher node? Before you answer this question, I want to talk about something really important about the design of the system which makes it massively scalable. The dispatcher node only has to publish an incoming event to a maximum of the number of frontend machines. It doesn't have to worry about all the connections that these frontend machines are in turn holding. It only cares about this green fan-out here, which is the number of frontend machines that this dispatcher can possibly publish an event to, but it doesn't have to worry about this red fan-out. That's the part that the frontend machines are handling, and they're doing that with in-memory subscriptions, with Akka Actors, which are highly, highly efficient in this. Now with that context, what do you think is the maximum events that you can publish to this dispatcher per second? Participant 11: Ten thousand.

Gupta: Very close. That's a very good guess. It turns out for us that number turned out to be close to 5,000, so 5,000 events can be published per second to a single dispatcher node. Effectively, we can publish 50,000 likes per second to these frontend machines with just 10 dispatcher machines. By the way, this is just the first part of the fan-out. These 50,000 likes per second will then be fanned out even more by all the frontend machines that are able to do that very efficiently. That's a multiplicative factor there, and that will result in millions of likes being distributed per second.

Lastly, let's look at the time, because everybody really cares about latency. You're building a real-time system so you got to make sure that things are super fast. Let's talk about the end-to-end latency. If you recall the time, T1, at which the likes backend publishes the like to our real-time platform, which is the dispatcher machine, and we record the time, T2, at which point we have sent the like over the persistent connection to the clients. The reason we are measuring it there is because you can't really control the latency outside your data center. I mean, you have some control over it, but that's the one that the platform really cares about. Then, the data turns out to be just 75 milliseconds at p90. The system is very fast, as there is just one key value lookup here and one in-memory lookup here, and the rest is just network calls, and very few network calls.

These are some performance characteristics of the system. This end-to-end latency measurement is also a very interesting thing. How do you really do that? Most of you must be familiar with measuring latencies for a request response system. You send an incoming request and the same machine can measure when the response is sent out, and therefore, you can say that, "It took this much time." In this case, there are multiple systems involved. You're going from the dispatcher to the frontend node, and then to the client. How do you measure latencies for such one-way flows across many systems? That is also a very interesting problem, and we wrote about it. We wrote a system that we built using nearline processing, using Samza. Samza is another technology that we use at LinkedIn, and you can use that to measure latencies across end-to-end systems across many machines.

We wrote about it at tiny.cc/linkedinlatency. Don't have the time to dive into it here, but I would love to, and I hope that you get something out of reading something like this. If you have a system where you want to measure latencies across many different parts of the stack, you can use something like this to measure latencies.

Why does the system scale? I think it scales because you can add more frontend machines or more dispatcher machines are your traffic increases. It's just completely horizontally scalable.

The other thing that I mentioned at the beginning of this talk is that we also extended the system to build presence, which is this technology where you can understand when somebody goes online and offline. Now that we have these processing connections we know when they were made. We know when they were disconnected, so we also know when somebody came online and when somebody went offline, but it isn't that easy, because mobile devices are notorious. They will sometimes just have a bad network. They might disconnect and reconnect without any reason. How do we average out or produce all that noise to figure out when somebody's actually online and when they're offline, and not just jitter all the way where you keep going offline and online, because you have connections and disconnections simply because of the network that you have?

We wrote about that at tiny.cc/linkedinpresence, where we used the concept of persistent connections to understand how somebody goes online and offline, and we built the presence technology on top of the exact same platform, so I hope that's also useful to you.

Key Takeaways

That is probably a lot to consume in the last few minutes, so I'll try to see if I can help you remember some of this. Real-time content delivery can enable dynamic interactions between users of your apps. You can do likes, you can do comments, you can do polls, discussions. Very powerful stuff, because it really engages your users.

The first piece you need is a persistent connection. For that, there is built-in support for EventSource in most browsers, and also on most server frameworks. There are also easily available client libraries that you can use for iOS and Android. Play and Akka Actors are powerful frameworks to manage connections in a very efficient way that can allow millions of connections to be managed on your server side. Therefore, they can allow millions of viewers to interact with each other. Everyone remember, Akka Actors, way cooler than Hollywood actors.

The principle I started this presentation with that challenges in distributed systems can be solved by starting small. Solve the first problem and then build on top of it. Add simple layers in your architecture to solve bigger challenges. This is all we did throughout this presentation. When you hit a limit, horizontally scaling the system is usually a good idea. Add a machine, distribute your work.

The Real-Time Platform that I described to you can be built on almost any server or storage technology. You can use Node.js. You can use Python. All of these server frameworks support some methodology of maintaining persistent connections. For the key value store, you can use Couchbase, Redis, MongoDB, anything that makes you the happiest, anything that you're already using.

Most importantly, you can do the same for your app. Real-time interactions are very powerful, and I feel that if you use some of the principles that I shared with you, you can do some pretty interesting stuff, and pretty dynamic experiences in your own apps.

Thank you, everyone, for attending this session. I'm a proud Indian. I work at LinkedIn in the U.S., and I'm so glad that I got this opportunity to talk to you here at QCon London. This talk and all of its slides will be available at tiny.cc/qcon2020, I'm assuming very soon. There is also an AMA session at 1:40 p.m. where you can come and ask me anything, not just related to this, but anything else that you have in your mind. That's happening at Guild.

Questions and Answers

Participant 12: Do you have any fallbacks for clients that don't support server-sent events? Or do you just say modern browsers are our focus here?

Gupta: The beauty of server-sent events is that they're literally a regular HTTP request. There's absolutely no difference between what a regular HTTP connection would do. In fact, WebSockets are something that sometimes get blocked by firewalls in certain systems, and we have never experienced a case where server-sent events don't work. Because it's a regular distributed connection, most firewalls will not block it. Most clients will understand it, and we have never seen a case where server-sent events doesn't work.

Participant 13: How do you synchronize your video stream with likes with time, basically?

Gupta: I think the question here is that once these likes have happened, how do you make sure that the next time somebody watches this video the likes show up at the same time? Is that what you're asking?

Participant 13: Yes, and also, the video streams are delayed a little bit on different servers, and your likes are happening in different places.

Gupta: Yes, I think you must have noticed here that there is a delay. I think the question here is that, "I liked at moment X, but maybe the broadcaster sees it at moment Y." Yes, there is a delay, and some of it is simply because of natural causes, you're just speed of light. The other is that there is also sometimes something that we do to make sure that the broadcaster can cut something off if something is seriously wrong. The good thing here is that once somebody has pressed like, it will show up to all the viewers almost instantaneously. You can actually try it right now. If you press Like you should actually be able to see it almost immediately. The distribution is real-time, but yes, there may be a delay between when you think that the broadcaster said something versus when you actually liked it, and that is natural. I think there are also security reasons to do so.

Participant 14: My question is, do you have any consistency guarantees, especially in view of dispatcher failure even across data centers?

Gupta: Yes, great question – what about consistency? How do you show guarantees? How do you make sure that a like will actually get to its destination? The short answer is that we don't. Because in this case, what we are going for is speed, and we're not going for complete guarantees for whether something will make it to the end. Having said that, we measure everything. We measure the cross-colo dispatch. We measure the dispatchers sending requests to the frontends, and we also measure whether something that was sent by the frontend was actually received by the client. If we see our [inaudible 00:46:27] falling, we will figure out what the cause is and we will fix it.

Now, I do want to share something else now that you asked this question, which is Kafka. A natural question is, "Why not just do this with Kafka?" If you do it with Kafka, then yes, you do get that, because the way you would do it with Kafka is that the likes backend would publish a like over to a live video topic that is defined in Kafka, and then each of these frontend machines would be consumers for all the library or topics that are currently live. You already see a little bit of a problem here, which is that these frontend servers are now responsible for consuming every single live video topic, and each of them needs to consume all of them because you never know which connection is subscribed to which live video, and connected to this frontend server.

What this gives you is guarantees. You cannot drop an event anywhere here in the stack, but you can drop an event when you send it to the client from the frontend server, but you can detect that. In fact, EventSource interface provides a built-in support for it. It has this concept of where you are. It's like a number that tells you where you are in the stream, and then if their things get dropped, the frontend server, the next time it connects it will tell you that it was at point X, and the frontend server can start consuming from the topic at point X. What you give away here is speed, and also the fact that the frontend servers will stop scaling after a while, because each of them need to consume from these streams, and as you add frontend machines that doesn't help. Each frontend machine now needs to still consume all the events from all the Kafka topics.

Participant 15: You have 100 connections to the clients, and some of the clients are very slow. They might not be consuming your data properly on the pipe. How do you ensure that you don't have memory exhaustion on the side of [crosstalk 00:48:41]?

Gupta: Notice that when the frontend server sends the data, or has the persistent connection to the client, it is actually a fire and forget. The frontend server itself is not blocking on sending the data to the client. It just shoves it into the pipe and forgets about it, so there is no process, or no thread that is waiting to figure out whether the data actually went to the client. Therefore, no matter what these clients are doing, they might be dropping events, they might not be accepting it because something is wrong on the client side. The frontend server is not impacted by that. The frontend server's job is to just dispatch it over the connection and be done with it, again, because we are going for speed and not for [crosstalk 00:49:24].

 

See more presentations with transcripts

 

Recorded at:

Mar 30, 2020

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Presentation feedback and few follow up questions

    by Renjith Nair,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Great presentation. Well structured and interesting content. Wanted to ask you the following questions regarding the presentation.


    • 1. The latency measurement using Samza Aeon (tiny.cc/linkedinlatency) looked more complicated than the actual streaming infrastructure. Was the reason to use that one purely based on existing infrastructure (i.e existing kafka based events handling)

    • 2. Did you look into other systems like Apache Pulsar? Was the limitations same as that of Apache Kafka (i.e all frontend servers needed to subscribe to all streams)

    • 3. Was there ever a risk of more than once delivery since all frontend servers could connect to any dispatcher at a time?(especially during connection loss and reconnection)

  • Re: Presentation feedback and few follow up questions

    by Akhilesh Gupta,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for your questions, Renjith.
    1) Yes, we do have significant existing infrastructure for Kafka at LinkedIn. Additionally, we developed that latency measurement system for much more than measuring the latency for end to end flows in our Realtime Platform. We use it for various complex asynchronous one way flows at LinkedIn including latency for push notifications, glass to glass message delivery latency, etc. which involve tracking events being emitted by heterogeneous systems.
    2) Yes, very similar limitations:
    - Each frontend server needs to consume from all topics
    - If consumption can't keep up, adding more frontend servers doesn't help
    - Cross-data center publish needs us to use things like topic mirror makers which slows things down significantly
    - Live video is a broadcast topic (many subscribers for the same topic). But, we also have personal topics (one subscriber per topic). e.g. messages sent to a particular member. Need millions of topics in Kafka/Pulsar to support that at LinkedIn scale.
    3) I may not have explained this well but frontend servers don't "connect" to a dispatcher node, they simply add a subscription to the KV store via a random dispatcher node. Then, when an event is published to a random dispatcher node, it looks up the KV store to determine which frontend nodes to forward that event to with a regular HTTP request. Thus, we cannot have more than once delivery semantics. In fact, we have at most once delivery semantics.

    Hope that helps, please feel free to follow up on any of these.

  • Amazing preso and few more questions

    by Priya Venkatesan,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for giving us all a great deep dive in to the system architecture. It is very relevant to my current project and had a few questions.

    1. When the Front-end subscribes to the Dispatcher, how does the dispatcher push info to it?
    My assumption: Front-end sends a simple HTTP POST with its node info and obtains a 200 back from the Dispatcher. Is there a callback from Dispatcher back to the front-end?

    2. How does the Likes Back end "push" info the Dispatcher?
    My assumption: It simply does a HTTP POST to the Dispatcher endpoint. If so, has there been any concerns about out of order submissions from the Likes back end. Since the requests themselves are multi-threaded, could a comment to another comment or a like be sent out of order?

  • Re: Amazing preso and few more questions

    by Akhilesh Gupta,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for your questions, Priya.
    1) Subscription: Front-end indeed does a simple HTTP POST request to the dispatcher with its node info and the dispatcher stores the subscription in the KV store which maps topics to the frontend nodes that are subscribed to that topic and returns a 200 OK to the front-end node.
    Publish: When an event is published to a random dispatcher node, it looks up the KV store to determine which frontend nodes to forward that event to with a regular HTTP POST request.

    2) Yes, the Likes backend does a simple HTTP POST to the Dispatcher publish endpoint. Yes, the system does not guarantee order and the system does not guarantee delivery. However, each like publish is associated with a particular comment and thus, cannot possibly be pushed to the wrong comment. Also, out of order likes are not a deal-breaker. Out of order comments might be a concern in certain use cases and in those scenarios, either the client can "sort" a newly received comment into the right location using its created timestamp OR it can simply add comments in the order they are received and simply reshuffle them the next time the comment thread is opened to sort them in the right order.

  • Re: Amazing preso and few more questions

    by Priya Venkatesan,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thank you very much for your answers.

    I am inferring that each front-end server implements a callback(webhooks approach), where the Dispatcher "publishes" messages through HTTP POST.

    Can you kindly clarify this? This is the last piece that connects all the dots for me.

  • Re: Amazing preso and few more questions

    by Akhilesh Gupta,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    No, when the front-end server does the subscription request to the dispatcher, the dispatcher stores the hostname of the front-end server in the subscriptions table in the KV store to be able to publish to that specific front-end server using a HTTP POST during the publish flow. This HTTP POST to the front-end server is over a simple HTTP endpoint called POST /publish defined on each front-end server which accepts the topic and the event to be published on that topic.

  • Re: Amazing preso and few more questions

    by Priya Venkatesan,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks!

  • Kafka Question

    by Fran Ren,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    First of all, amazing video thanks.

    Just a small question because at the end, Kafka was mentioned and one of the downside was because then the frontend machine need to receive all likes from all live streams. Why can't you partition the Kafka topic by stream IDs (sure you might end up having imbalanced partition), and then having the frontend machine just subscribe to specific partitions? Essentially, saving the efforts for building the dispatcher and the KV store.

  • Re: Kafka Question

    by Akhilesh Gupta,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    That is because the frontend machines don't know in advance which stream IDs its connected clients would be interested in. In most cases, a frontend machine would have a diverse set of clients interested in almost all the streams connected to it resulting in it being forced to subscribe to almost all the live streams.

  • Connections actors-based filtering

    by Bryzgalov Anton,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Akhilesh, I have recently watched your talk from QCon 2020 on LinkedIn real-time platform. First of all, thank you for an interesting and amazingly structured talk!

    I have a question: why have you decided to filter message per subscription at supervisor actor and not on the level of every connection actor itself? Meaning that supervisor actor sends all the messages it received to all the actors and they decide whether to process it on their own based on which live the like relates to.

    This looks like a simpler solution although not representative when following the principle of your talk (adding simple layers).

    Is it done for performance reasons to not activate sleeping actors if they don't have to send events to a corresponding device?

  • Re: Connections actors-based filtering

    by Akhilesh Gupta,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Two reasons:
    1. Subscription requests from clients to frontend server nodes are not going to individual actors but to the node as a whole and thus, individual actors would also have to lookup the same common subscriptions table that the supervisor actor is looking at instead of being able to make the decision themselves.
    2. There could be hundreds of thousands of connection actors on a single node and if we make all of them receive a message and do some work for each published event, we would end up using CPU time for each of them even if only one of them is subscribed to the event.

  • Clarification and Question

    by ashish arora,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I was asked this question in facebook interview. I wish I would have read this article.

    Clarification
    1. what is a dispatcher node. Is this a Kafka queue having multiple hosts and one host is called dispatcher node or it is simply ec2 node. dispatcher is more of a service which sends http request to relevant front end nodes.

  • Quick questions

    by Shashank Tiwari,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hey Akhilesh, Thanks for an amazing presentation. I have a few questions about this listed below,

    1. How does the KV store get updated when a front end dies?

    2. How do the dispatchers find each other? What happens when one dies?

    3. How does dispatcher broadcast prevent infinite loops? (1 publishes to 2 and 3, 2 publishes to 1 and 3, and so on) Why not use gossip protocol instead?

    Thank you

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT