BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Nearline Recommendations for Active Communities @LinkedIn

Nearline Recommendations for Active Communities @LinkedIn

Bookmarks
50:14

Summary

Hema Raghavan focusses on technologies they have built to power LinkedIn’s “People You May Know” product. She describes their nearline platform for notification recommendation and shows that delivering the right information to the right user at the right time is critical to building an actively engaged community.

Bio

Hema Raghavan heads the team that builds AI and ML at LinkedIn solutions for fueling the professional social network’s growth. Prior to that, she was a Research Staff Member at IBM T.J Watson. She started her career in the industry in Yahoo Labs.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Hello, everyone. I'm Hema Raghavan and as it introduced, I am head of the AI and machine learning for growth and notifications at LinkedIn. Prior to LinkedIn, I was at IBM Research and I've also worked at Yahoo Labs. I've worked in search, advertising, and more recently, recommender systems. And in today's talk, I will be talking about building near realtime contextual recommendations for active communities on LinkedIn and that's a mouthful of a title. So we'll break that down out. First, I assume most of you here use the LinkedIn app. How many people use LinkedIn? Great. So we use LinkedIn for networking, for finding jobs, and so on. And what you'll understand through this talk is also what it means to have an active profession community, how LinkedIn can help you build that active profession community, but also why near realtime platforms which can do contextual recommendations and all of that will start making sense by the end of this talk. So why you need a near realtime platform to do this.

Economic Opportunity for Every Member of the Global Workforce

So at LinkedIn, we have a very big vision, we are only part of the way there, our vision is to create economic opportunity for every member of the global workforce. The way we do this is we use AI and machine learning to connect the world's professionals to make them more productive and successful. Note the keyword "connect" and connections are what drive most of our profession and careers. So many of us here are to learn, but we're also here to connect with people. And that's what propels our careers. And that's the founding principle of LinkedIn. For LinkedIn's connection engine or, you know, to build your connections in LinkedIn, one way to do it would be for every person you know, you potentially search, you look at their profile, and you actually hit connect. But for 10 years or so, we've had a critical recommender engine which actually helps people build their connections.

People You May Know

The product here is called People You May Know. It's been there for over 10 years now and its mission is to connect our members to the people who matter most to them professionally. And this allows them to access opportunities in the LinkedIn ecosystem. The way we do this is we mine data sources which is the LinkedIn's Economic Graph. The Economic Graph is not just the connection network, but it could also entail companies, schools, all these other nodes that you can have that you have links to. So we use AI and ML, and graph mining techniques to build algorithms on top of this graph and this is how we build our connections. And it's intuitive why building a network is useful, but I'll say a little more here. And the key is that by being connected, in some senses, you stay informed. We do that in workplaces, we do that in the actual physical world on LinkedIn. Your connections help you stay informed on the feed. So if someone shares something, someone's reading an article, someone's, you know, going to QCon and a bunch of your networks are going to QCon, you might discover that on the LinkedIn feed.

If you're looking for a job, it's actually again very clear why connections are useful. So for example, if you're looking for a new job, most people reach out to their professional connections for a referral and that's how connections help you advance your career. And then if you're a hiring manager or even if you're a tech lead which many of you in the audience are, if you want to ask a technical question, you might actually solve the problem faster by reaching out to someone in your network. So connections help you work smarter and of course, if you're a recruiter or a hiring manager, the value proposition is obvious.

High Quality Relevant Connections Matter

Now, for LinkedIn, we've had connection data that we built over several years and the fact that high quality relevant connections matter is something we see in our data on many different metrics. So the graphs here actually show you three different metrics, the X axis here are the number of connections and then you'll see the number of in-mails received, in-mails are often recruiter mails. But the number of opportunities that come to you actually increases with the number of connections you have. Messages can be recruiter messages, or they can be even just people asking for your advice or seeking out help. And then in general, the value for LinkedIn itself, like how many daily active users we have or how much engagement we have on our site increases as the member has more connections. So connections are critical and for growth at LinkedIn, and when I say growth, at LinkedIn, we use the term "growth" to mean to be the process of bringing users into LinkedIn. So not just signups, but taking them through the journey to when they actually engage with LinkedIn on a periodic cadence and as they understand what the value of LinkedIn is.

Developing a True North Metric

So growth is essentially grow in LinkedIn, but making sure that our members actually understand what the value of LinkedIn is. And in LinkedIn, one of the ways we approach our problems is to often think of a true north metric. So we will take this mission, right? For example, we have the larger vision of connecting people to an economic opportunity, but here within growth, we're thinking about how do we bring people to the site and actually see value? So we typically define a metric a success criteria that actually measures the quality of how our product is doing for this value proposition. And we have moved on from, you know, real metrics. So a lot of companies do daily active users or monthly active users, but we often try to take something that's more nuanced and just beyond role counts.

Once you've arrived at a true north metric, we often use data science and many of you saw the keynote this morning. So you'll have a true north, but our data scientists will go in and actually say, "Hey, what are the products we can actually build to move this true north?" And then we have a large framework for A/B testing that lets us that lets us inform product decisions.

So the true north for the growth team is essentially to get engaged members with high quality connections because we think that's the first stage, once you have that, you will get the value for LinkedIn. Towards this goal within the PYMK group, what they know they can drive, and this is established through correlation or causal analysis models, is they can drive the connections component. So the proposed metric would be PYMK invitations sent and accepted. So typically when you come into the recommendation engine, you may hit invite, invite, invite. Typical recommendation products would measure CTR and that would have just been how many invites you sent. But we really want to look at the downstream. So we want to see given an invitation, do you actually get an acceptance. So we look at the downstream impact and that's what I meant by actually making sure that your metric makes sense. It's not just something that is easy to move, but making sure that you send invitations and those invitations are accepted. And we monitor both. You certainly don't want a large invitation volume and a very low connection volume or in an A/B test, you don't want to be driving up invitation so much, but your connection accepts are not actually moving at the same rate. So metrics in some sense are the conscience of your product, they keep you honest, they keep you close to what value you bring.

Likewise, we also look at the recipient side. So for example, we will look at how many inbound invitations does a member get? So if you're flooded with invitation request and you're getting notifications, that's a pretty bad experience. So we want to maintain a base rate for that as well and in our A/B tests, we will look at both invitations sent and received, and both the rates. So now, we define the metric. So how do we go about building a product like people you may know?

Connecting the World’s Professionals

And for this, I will take a small sample graph. And let's say there's a new user, Lucille, who comes in. Let's say she's just joined as an intern at a particular company, she's a student, and her manager tells her that LinkedIn is the place to be and sends her a connection request which she accepts because you generally accept your manager's connection request. And so, she's built her first edge on the LinkedIn graph. Now, who do we recommend to Lucille so that she builds this network, right? Because if she has just one connection, she's not really going to get a lot of value out of LinkedIn.

So in the limit, this becomes an N squared problem because we could recommend anyone to anyone. And right now, we have close to 600 million members on the graph and that's just computationally infeasible. So a common heuristic that makes sense and it's actually established in the social science literature is that friends of my friends are likely to be my friends. It's homophily, there's many different terms in the literature, in social science as well, and it works in the real world as well.

So with that, we can use the simple heuristic that we can look at Dominiq's network. So that just brings the candidates from N squared down to four in this case and we say which of these members is, you know, a potential recommendation for Lucille? Now, this is a small enough problem that we can show all of them in a rank sorted list. But you can very well imagine that, if Lucille had 30 or 40 connections and each of them had hundreds of connections each, that would blow up, so you have a large candidate set and how you're going to rank this.

So the second piece of intuition again very natural is that people I know share common connections. They may have common institutions, skills, so on and so forth. And we bring this into our models. So as we said, Lucille and Dominiq work at a given company. And so, let's say Erick is another person at this company. So perhaps it makes sense that the top of this rank list is Erick and let's say then, we recommend Erick to Lucille or Lucille to Eric, vice-a-versa. Depending on who comes to the site first, we can actually recommend one way or the other. And one sends invitation to the other, the other accepts it and this edge is built. So now, Lucille has started building her network.

Typical Playbook for Recommendation Systems

Now, this playbook for recommendation systems that there is candidate generation and then there is something that reranks those candidates appears everywhere because it appears in newsfeed ranking, it appears in search. So for those of you who work in search, you'll typically have something that generate … it's either heuristics or simpler models would generate a first class of candidates and then you may rerank it using maybe deeper models. And when I say deep, actually, something like deep learning can actually be applied there. You can actually put things which are computationally more expensive, more features. And so, the first layer focuses on recall, the second layer focuses on precision.

Candidate Generation

For graph algorithms and some of this even applies to follow recommendation like problems which appear in companies like Twitter or Pinterest, and so on, where you may actually navigate the graph, you'll see a second degree network. And then you also have extensions like personalized page rank. Personalized page rank is actually fairly intuitive to understand. So it's a random work algorithm. So given a starting node, it computes the probability of landing at a destination node. You might do a random work. So given a node, you might land at anyone of the neighbors and then you may jump from there to anyone of the other neighbors. And then if you did this a few times, where would you end up is what personalized page rank does. And it's an extension of friends of friends essentially and it lets us go beyond the second degree network. And it really helps in the case when the initial candidate generation phase has very low liquidity, a small number of candidate sets. So for example, in the Lucille example when she had only one connection which was Dominiq, personalized page rank would have helped you to extend the candidate list to even perhaps a third degree network. But again, it's not right up to N squared.

So once we have candidates from these graph algorithms, typically, scoring tries to predict the probability of a connection. And here we're applying node features and node features are, you know, whether what skills a person may have, what school the person is at, the company, so on and so forth, the general propensity of the person to invite someone or accept an invite, and so on. And then we have features of the edges which could be common connections of the commonality of the schools, skills, so on and so forth. So you have all of this. You put this in your favorite decision tree or in a logistic regression, or whatever favorite model you have, and you have an output probability, and you use this for ranking.

PYMK Architecture

And as I said, PYMK has existed for a long time and for some of you who have attended some of the talks from the team and meetups, or other forums, you may have seen a typical architecture diagram. And this is often involved computing offline in batch. And in fact, PYMK actually, I'm kind of proud to say has been the birth place for a lot of LinkedIn's open source architecture. So for example, Voldemort or some of the stereotypical examples that are in the Kafka paper for why Kafka was built and so on actually stem from PYMK.

So typically, candidate generations or this world friend of friend calculation, or personalized page rank is done in Hadoop, it could be MapReduce or in Spark. And then you do scoring and then you push to a key value store. And the key value store in today's architecture is Venice which is a successor to Voldemort. Once you have data in your key value store, the keys are members and then when a member comes to the site, you look up the key on the member ID. We might apply some real-time signals. A very simple example of a real-time signal maybe the fact that Lucille just joined this company x, y, z. So then you can actually just rerank the candidates such that based on the context that she just joined this. Maybe she did a profile update half an hour ago. It could be a very simple real-time signal.

And then you do some rescoring and then you output the set of candidates. And then what you see back is once a member starts clicking, you generate tracking events and these go through Kafka back into our offline infrastructure. This is the offline tracked events I used for A/B testing, reporting, and back into our model training pipeline. So this cycle just continues and models can be run on a periodic basis. So the Netflix talk talked about auto-training pipelines that you want to run at a regular cadence and so on. And you can instrument all of that through an architecture like this.

Data Processed Grows Linearly

Now, what happens with a batch offline is that the data grows and it gets computationally heavier and heavier. And you're precomputing, in that previous architecture, you're precomputing for every member, many of which do not visit your site. So you have a large heavy computation and the fact the data process grows linearly is not a surprise. But at LinkedIn, for every member you add, that is every node you add to the graph, the number of edges and that's for us it's really the number edges that you're processing, grow super linearly. So every edge creates a compounding effect on the second degree network or in the page rank like candidates that you're generating. So that grows super linearly and this blows up our offline infrastructure.

Scalability of Batch Offline

To give you a little more of a window into why that happens, I'm just showing you a simple example with two tables. One is where you have the node features and you have a table with member IDs and a set of features, and then you have pair features. And pair features are really edge features. So you have a source and a destination. So two members and you have a set of features. And you do two joins. So you do a first join where the source ID is the same as the member ID and then you do a destination side join where you join it back to the member ID table. So you get this big fat table in the middle. You put that through your scoring and you get a result, and that's when you get your probability of connect. But this table here is big. That's where we are actually shuffling. Mid 2015, we were shuffling trillions of records in MapReduce and we just took lots of cost to serve initiatives. We started getting smarter and smarter about how we started doing joins.

Need Smart Joins

And we have a blog post about this, so you can actually see the algorithms we developed. So one of them was just getting smarter about how we partition the data and that's what we call the 2D Hash Join algorithm and even triangle closing follows this pattern of this big middle, and that's easy to see, because I already talked about how a single node compounds the number of computations. So you have this big fat middle and we had got smarter there where we did matrix multiplication.

So we started taking these jobs which would take several hours in our offline MapReduce, and in fact, we had the reputation in the company of being like, when the PYMK job is running on the Hadoop queue, nobody else can get anything done. So as we got smarter and smarter, so wanting to decrease cost to serve, we started doing these optimizations. We brought down some of our compute costs down to several tens of seconds and this was amazing.

Freshness Matters

But something we observed besides cost to serve was on metrics. Our PYMK sent and accepted lit up like, by huge numbers every time we did a cost to serve improvement. And so, we saw that freshness mattered. Now, why does freshness matter? So like, the fact that we just reduced cost to serve time and the index that we have, that's key value store is just way more fresh like, instead of starting at a snapshot of the graph which was severed as old that it is actually much closer in time to where it is where the graph was when the user comes in, that it lit up our true north metrics was a mystery to us.

Why Near Real-Time PYMK

So we started actually digging deeper and deeper, and this is where our data science hat came on. And all our data showed us, in our analysis, that network building is contextual and it often involves people exploring a cohort or a subnetwork in a session. So what people would do is people don't come and engage with PYMK even though it's the second tab on LinkedIn on a daily basis. But when they come in, they'll go click, click, click and they'll build a network. And that context really matters. So if you're connecting to your QCon network, you're very likely that in the subsequent scrolls, if we show you people you're meeting here, that you're likely to connect. If I interleave one person from QCon with an old high school buddy, the odds are that's not how network building works. And if you will analyze your LinkedIn network, it's likely that you build subnetworks of these over time. This is just a mock example of someone who went to Stanford, built a network, and then perhaps worked at LinkedIn, Yahoo, and maybe some other places. And each of these subnetworks will probably build at a moment in time.

Near Real-Time Recommendation Platform: GAIA

So all of these intuitions including the cost to serve on our offline pipelines motivated us to try a proof of concept for a near realtime recommendation platform. So we have a pretty of mixed group of skill sets in the team. So the distributed systems people came in and they said, "You know what? I'm going to try doing this in memory." So what they did was they built a snapshot of the graph on Hadoop, they pushed it to a large RAM machine so we would ingest the graph in memory, but every update that came to the draft, so every new connection that was made in the LinkedIn ecosystem made its way into GAIA through Kafka. So at any given point of time, this platform which we called GAIA had the true representation of the graph. The only latency being the Kafka latency. And our systems experts also built a very simple API where our data scientists would just write random walk like algorithms. They could bias these random walks to walk around the neighborhood of a school or the neighborhood of a company, or you could bias these random walks.

And after the deployment of GAIA, we actually saw the biggest site-wide improvement in connections that we had seen in the history of the product. So it's your 10 years into the product, we were moving at a regular pace. So we were improving. We have never plateaued, but we actually saw one of the biggest improvements in the connections made. We also saw a bunch of other metrics especially signups. So the moment a user came in, we realized that the one of the first few screens they see is a PYMK. So then you actually have a more real representation of the graph, a more updated representation of the graph. And so, all of those metrics actually started moving in.

Now, one thing that came out is we had done all this analysis and we actually understood why a near real-time platform was helping. It started actually tickling our product partners' minds to start thinking about different user experiences.

Conversational "Network Builder"

So for a long time, our product had been fairly static. So you would see PYMK recommendations as a rank list and you would just go click, click, click. But now, let's take the example where let's say Dominiq invites Lucille and then Lucille accepts. And then when she accepts, Dominiq gets a notification which says, "Congrats, we're adding Lucille to your network." And then in that subsequent screen, Dominiq sees Lucille's first degree network. So you're actually exploring that subnetwork in near real-time. So it actually weighted the product manager's mind to actually start thinking of exploring subgraphs in near real-time.

Platforms Unlock New Product Experiences

Our India team always knew that people in India actually had a behavior of network building where in the third or fourth year of their university program, they would connect to alumni. And this is an inherent part of the culture for job seeking. People pay it forward. So basically, every group pays it back to the next group, in turn. And so, they said, "You know what? We'll just start showing PYMK in batches of alumni." And what GAIA allowed us to do was actually explore these alumni cohorts. So if I know which school I'm graduating from, it would actually explore that subnetwork of the graph in near real-time. So through all this, we actually discovered that context matters, near real-time context matters, and actually having a platform that was near real-time was really valuable. Not only from a cost to serve perspective, but also from how we shaped our product.

Active Professional Community

Now, this is why we've talked about edge building, but what good is an edge on its own if it doesn't help you? So towards our goal of helping our members finding their dream careers, at LinkedIn, we believe that it's important for a member to have an active professional community. What does an active professional community mean? It really means that when you look for help, you get the help you want. For example, if I have a question to ask about a paper on blockchain, someone in my network can actually help me answer that and is willing to help me answer that. Alternatively, if I have a question about machine learning, there may be some other subnetwork of mine that is actually relevant and is willing to help me. And we also find from the purpose of LinkedIn that building this active community where people are willing to share and then get feedback actually helps people share more. So actually talking about that a little bit more. So if I share and if I get no feedback at all, the odds that I'm going to put myself out there and ask for help is very little.

So actually what we find is that if we can steer edge buildings such that we know for sure that people will get the help they need, then we can actually make edge building or steer edge building in a way that we help people build these active communities. So let me explain that a little bit more. So this was the little graph that we built earlier and let's now take an example of where Alice shares a post. And like in all social networks, this becomes an eligible candidate for the feed off all of Alice's first degree network. Now, let's assume Carol, Bob, and Erick come to the site maybe an hour later, they see this. This is how Facebook behaves, all your social networks behave in this manner. And let's say Bob comments on the post and it does create for the viral cascades where Carol gets information that Alice is talking, Bob is commenting. It may actually propel Carol to comment more and this is the social behavior that you see on the network.

Biasing Connection Recommendations for an Active Community

Now, in this example, what happens is Lucille is kind of in a pocket where no information is going to her because she's connected to Erick and Erick is a passive consumer. Erick consumes, but he doesn't really share or comment, or do any of the social actions which actually create further information cascades. And Dominiq isn't even coming to the site. So Lucille really has an empty feed at this point and she's unlikely to see value from LinkedIn. So how could we potentially bias edge building so that Lucille actually sees some value for LinkedIn? And so, when we're looking at candidate generation or scoring, we could potentially consider which of these candidates are likely to be Lucille's active community. And this is what we mean in terms of actually helping build an active community. And towards that goal, perhaps connecting Lucille to Alice makes sense.

And so, a score ends up being more than just a probability of connect. It actually ends up being a probability of connection, it also includes the probability of a potential conversation between this edge, because we don't want to just build passive edges that are never going to help you. And you can take this objective function to mean not just help in the context of feed, but it can be other kinds of help as well. And we tune these hyper-parameters using an online parameter selection framework. There's more written about that in the literature.

Notifications

So that brings me to the end of the first half of my talk where I talked about building this graph, the graph that is LinkedIn. In the second half of my talk, I'm going to talk about the role of notifications and I said in growth, we think of growth as the general problem of showing you value for LinkedIn. And I will continue to use my running example of the simple graph to show you how notification is going to help. And again, I will show you why we need near real-time platforms and why we also need it to move from batch offline to near real-time for a similar problem.

So now, coming back to this example where Alice has an active community, she has people who give her feedback, so on and so forth, we have discovered that Dominiq was kind of passive. And what if this conversation or this share was really relevant to Dominiq and if he had actually seen it, it was super useful? We could potentially send Dominiq a notification saying that Alice shared this piece of content, you might want to look at it. Again, a lot of social networks do this. If this content, piece of content is very relevant to Dominiq's interest which we can of course infer from Dominiq's past behavior on other pieces of content and so on, it perhaps may motivate Dominiq to do another share and then that actually creates further viral cascades on the network. Maybe it prompts because Dominiq is the manager. Again, this is a behavior we often see is that if your manager comments on something, you feel like you've got to say something, too. So maybe Lucille is going to comment and then she suddenly becomes an active participant. She's a new user, but she's become an active participant for LinkedIn.

So we can shape these graph algorithms and even shape the notification problem to create these viral actions, and actually create pockets of conversations that are completely relevant to our users on LinkedIn. And we have many notifications that come off from LinkedIn, but one of them is the shared by your network notification and I'm going to use that as a running example in the rest of my talk.

The idea of a shared by your network notification is that a member never misses out on a conversation that is timely and important for them. This is what it looks like. It's the third tab on your app. As I said, there are many different notifications, but the top one which actually says Dominiq shared an update is what it typically looks like.

As many of you know, we have many different ways of getting notifications today. We have Push which is probably the most invasive. It comes, the phones buzz, it catches your attention, it makes you feel like you need to react. There's badging. So we can choose not to push, but just put that little red dot on your app saying "Hey, I'm missing out something." We may send you an email which is the least invasive and then we may just send it to your tab. So there's just many different ways. And so, we call these notifications channels. And you can think of them as just different places we can send you the notification.

And here's a graph which actually shows the number of sessions that come from the mobile app. And I think from any of you who work in a consumer facing internet product, you probably see your app sessions are growing at a much faster rate than sessions that come from desktop and mobile. And this is great. So you have more channels, but it also leads to something like this because most of our apps start looking like this. It creates notification fatigue and in the worst case, a member may completely tune out, they may either disable notifications. So as I said, push notifications are a great way to get the user back in, but if you're too noisy, they may just disable push or in the worst case, uninstall the app. And at that point, you've lost the user.

So the key problems for notification are to send the right message at the right time, on the right channel. And send as few notifications as possible. So there's a nice class of problems that reside in the notification space. If your company isn't using intelligence for notifications, I think it's probably worth considering it. In the next part, I'll just talk about the right message and later, I'll talk about minimizing the total number of notifications. I'm happy to chat offline about getting the timing right or the channels right because each of them is a very deep and interesting problem.

Notifications also follows the typical label for recommendation systems. So you have candidate generation. At LinkedIn, we talked about shared by your network as an example, candidates generation can come from anywhere. For example, the jobs team may decide that there's a set of jobs that you should actually see. We know you're a job seeker and we're going to send you the jobs that you're interested in. If you're a recruiter, maybe the corresponding product team has a set of candidate notifications that they want to notify you about, so on and so forth. So there's a set of candidates. It's different from the PYMK problem in that in PYMK, even the candidate generation is trying to solve one problem. In this case, candidates come from many different product or products all vying for your eyeballs and shared by your network is just one example of that.

Notification Ecosystem

Candidate generation can happen offline or it can happen in near real-time. We have both platforms at LinkedIn and often when a product decides that they want to send notifications, we'd often debate when whether it's an offline use case or an online use case. And good example of an offline use case would be say, work anniversaries or birthdays. It's interesting how chatty people get with their ex colleagues on the context of birthdays, but people just love that. They really engage with that form of notification because, you know, it's that one way you've forgotten ex colleague but you get a notification which says, "Wish x, y, z happy birthday," and then they'll have a set of back and forth messages. And some people engage with that more than others. So you have to use intelligence to know whether, you know, that's interesting or not to this given member.

So we have near real-time and offline platforms. Offline is useful in cases like the birthday notification because you know all the set of birthdays that are coming up. So you can batch process all of them offline, you can put smarter models. You know what's coming, so you can actually decide to control volume early on. And then we have a centralized decision making system called air traffic controller (ATC) which essentially takes all the candidates and scores them. It also does the problems of message spacing. So it means it should not be sending you messages in the middle of the night, it should not be sending you messages at times that it has figured out are more disruptive to you, and it also tries to control volume because it has the picture of the entire universe. And then once ATC makes the decision, it can go to some UI and decoration services. ATC also makes the decision as to what channel the notification goes through, then correspondingly decoration happens. And all of this message, so these are not API calls, all of this happens through a series of Samza processors and message parsing happening through Kafka. So Concourse is a set of Samza processors which then push down to Kafka, then it goes down to ATC which is another set of Samza processors and so on, and so forth. The offline is still offline batch, but we have mechanisms to push from Hadoop into Kafka.

Batch Offline or Near Real-Time

So notifications can be, as I said, batch offline or near real-time and there's value to either of the use cases. But in some cases and then the hypothesis is that especially in the case of breaking news or especially when our conversations are happening, people like to jump into the conversation when it's happening especially in a professional context, which is why maybe a lot of you use Slack or, you know, similar messaging applications in your workplace. There's some conversation that's happening that's just timely and you want to jump right in. It's different from the email use case which is kind of batch. And for that, we hypothesized and this was a hypothesis we had over a year ago. And we said maybe for shared by your network especially for certain kinds of shared by your network, depending on the topic, depending on the news worthiness and so on, that decreasing the notification latency from hours, so it was a batch offline process, so going from hours to a few seconds can actually help build an active community. So this is the example where Dominiq gets the notification that this active conversation is happening in near real-time.

So we built Concourse. At LinkedIn, we like these fancy names, probably many of you do. We built our near real-time candidate generation platform. In this example, Alice creates a post and it goes through Concourse. So that generates an event in Kafka. Concourse is again a set of Samza processers that looks at it. The post, it looks at the content of the post, it looks at Alice's first degree network and it decides which of her connections or which of her followers should get this as a notification. Remodeling back, it may look at behavioral signals between Alice and her first degree network. We may look at how often they interact and an affinity to the contact as well. So we talked about the fact that Dominiq perhaps really liked that topic and he was likely to share.

Results: Near Real-Time Candidate Generation

Concourse may choose to filter out some of the edges and then actually propagate the information to only some of the edges. Since they're from Concourse, it goes down to ATC and as I mentioned, ATC does reranking, scoring, message spacing, and so on, and so forth. And again, once we moved the shared by your network notification from an offline use case to a near real-time use case, we saw one of our biggest improvements in sessions. So people really liked this super engaged behavior, getting the notification on time.

Scoring

So scoring the models look a little different. We look at the incremental probability of your visiting given the notification. We don't just want to send you the notification because we say it's relevant if you're going to visit anyway and see that content on your feed. We don't want to send you the notification. So that keeps the volume in some sense down. And then we model the expected value of the session. We don't say you just come in or you just click, you hit or tap on the notification. What is the expected value? What's the probability that you're likely to do a share? And then once you do a share, more people are likely to comment and like, we try to estimate that downstream viral impact as well.

Notification Relevance Problems

And that brings me to the last problem in the notification space. I would spend a couple of minutes on it and really, this is also published work. It's been published in KDD almost more than two years ago. But really, we formulate the volume minimization problem as minimizing the total volume of sends and then we subject it to multiple constraints. So this class of problems ends up being a little different from what you see in the standard machine learning literature which is estimating some kind of score. It's actually a large scale LP solver, like a large scale linear program which says minimize the quantity, subject to some other constraints. And again, with the launch of the ATC volume optimization, we actually saw that we cut down our send volume. These charts are for email, but it applies for any channel. And we saw a huge reduction in the number of complaints. Now, in the email, you can see complaints, you can see unsubscribe. So you can actually, in A/B tests, measure these metrics and we saw huge reductions in this with very little page view loss or very little session loss.

Product Optimization

Now, with the launch of a near real-time platform, again in the notification space, it tickled our product manager's minds. It's like, "Hey, we have a near real-time platform." So now, let's just make sure that when you've sent this person a notification on something, let's not waste real estate on the feed for the same item. So you can start to actually jointly optimizing your product across all of these different tabs, all of these different experiences, and having a near real-time view or having state in near real-time really helped.

Summary

So through both of these parts of the talk, what we learned was that online/nearline computations captivate the user in the moment, and it's not just driving your metrics. But it also started shaping our product. So there's this cycle between when we think of a product or we may start at a platform. In both cases, we actually moved the platform, it helped move our metrics and then it actually just changed our product. And we see this continuous cycle and we've learned that we just need all the people with all of these skill sets to just keep talking and keep iterating on ideas at the same time. So with that, I'm happy to take questions.

Questions and Answers

Man 1: Thank you for the talk. GAIA has the entire graph in memory or are you segmenting?

Raghavan: We have the entire graph in memory. These are beefy machines, but you can use some heuristics to prune certain kinds of nodes out of it.

Man 1: But you're pruning nodes or attributes?

Raghavan: No, we don't prune any attributes because we have fairly rich models. The heuristics we use often aim at pruning nodes.

Man 2: At LinkedIn, you also have the paid members and the unpaid members. Do you have the segmentation of the candidates that you've been talking about even in the modeling as well?

Raghavan: Not for our core consumer products, but certainly as a premium member, you get an experience. So for those products, you may have explicit tuning of the models. But, yes, not for the core.

Man 2: So no extra special notification for the premium members which come in?

Raghavan: Again, that's a set of products that may send you a notification, but your PYMK is not going to be different. So we're not going to tune your PYMK.

Man 3: Do you guys do embeddings of your users as well? And if so, have you considered doing things like KNN and other ways of actually comparing similarity of users?

Raghavan: Yes. So we do do embeddings. What I did cover is in those node features, I talked about human interpretable features, but definitely, some of those vectors are actually embeddings.

See more presentations with transcripts

 

Recorded at:

Dec 27, 2018

BT