BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Personalized Notifications at Twitter

| by Andrew Morgan Follow 0 Followers on Jun 30, 2017. Estimated reading time: 3 minutes |

Gary Lam, staff engineer at Twitter, spoke about personalized notifications at QCon London 2017. This involved giving a high-level overview of their personalization and recommendations algorithms, and an explanation of how they work at scale despite the large volumes of data and bi-modal nature of Twitter.

Personalized fanout is the concept of only sending a notification to a user if it is about their interests. The example given by Lam is a Tweet by Elon Musk about electric cars. Rather than all of his followers receiving a notification, only the ones who like electric cars would.

Lam explained that the personalized fanout algorithm functions by keeping track of two things:

  1. Recent engagements with entities: These are likes, replies, and other user interactions that a person has had with a particular entity, such a hashtag or account. Lam stresses the importance of this data being up to date, as users will only be interested in what they have been Tweeting about recently.
  2. Top followings: Although a user may follow hundreds of other users, only certain ones will make it into their top followings - these are the ones that a person would be the most interested in hearing about.

When applying the algorithm, the first thing that happens is extracting the entities from a Tweet. Then, for each follower, a check takes places to see if the entities belong to one of their recent engagements, and another check takes place to see if the Tweet comes from a top following. If both of these conditions are true, then the user will receive a notification as they are likely to be interested in the Tweet.

Lam explains that the main problem with personalized fanout is asymmetry. If a user has millions of followers, then whenever they Tweet the algorithm must be applied to every single one of them. On the other hand, other users may only have a couple of followers. 

To work around this, Lam explains how they make use of data co-location. Each user is sharded, and their recent engagements and top followings are kept together with those shards. This means whenever the algorithm is run, there are no network hops, greatly reducing latency.

Lam points out that recent engagements don’t need to stick around for very long, as by their nature they are short lived. This has led to them being kept in memory.

In the event of a shard going down, data rebuilding has been heavily optimized to happen as quickly as possible, in order to make sure users still receive their notifications. This is done by replaying all the Tweets over the last day from a queue, but then batching the messages and removing redundant data before feeding them to the shard. This is known as a "slim firehose".

Top followings are calculated with an offline machine learning algorithm, which works by looking at the historical interaction between users. Because they are calculated in advance, the data can be copied onto disc on the shard at boot time, and then lazily loaded when required.

Lam also spoke about recommendations. These are slightly different to personalized fanout, in the sense that a user does not have to be a follower to receive a notification - they only have to be potentially interested in the content.

In this case, rather than a feeding in events, each user can be looped over. Lam explains that this makes it easier to utilize resources, as the number of users, thus load, can easily be predicted. During the process, several steps take place:

  1. Fatigue: If a user is not interested in, or does not engage with notifications then they will not be sent them.
  2. Candidate sources: User ID’s are exchanged for notifications that may be relevant to them. Two technologies pointed out to help with this are GraphJet, Twitters real-time graph processing library, and Scalding, Twitters offline map-reduce algorithm.
  3. Ranking: Making use of machine learning to pick the best notifications for the user.
  4. Push: Pushing the notification to the user's device.

The full talk is available online, and is also preceded by a talk from Saurabh Pathak on delivering notifications in real-time, also summarised in an article

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT