BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Forecasting at Twitter

| by Manuel Pais Follow 5 Followers on Dec 30, 2013. Estimated reading time: 2 minutes |

Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.

Besides the actual forecast correctness (assessed a posterior by comparing estimations to actual results over time), other important criteria for assessing forecasting applicability at Twitter are the model’s ability to handle both seasonality (i.e. accommodate the recurring patterns of usage on a daily basis) and trends (bursts of usage during major sports events, for example). Addressing the underlying seasonality becomes harder to cope with without adequate forecasting models:

As the user base and engagement grows, forecasting business metrics such as Tweets, Favorites, Photos, etc becomes non-trivial owing to an underlying trend and the aforementioned seasonality. In such cases, use of linear regression for forecasting is ill-advised as linear regression does not capture the seasonality of the time series. To alleviate this limitation, we have been exploring the use of ARIMA models which explicitly model trend and seasonal component of a given time series and consequently yield statistically robust forecasts.

Nevertheless, applying the ARIMA model blindly to a time series does not necessarily result in statistically robust forecasts. This is mostly due to the fact that irregularities in the data can impact model building and subsequent forecasting. If an abnormal time period does not show the seasonality, then the overall forecast seasonality nearly disappears as well. Furthermore, if the boundary data points of a given time period happen to be outliers the overall forecast is skewed as well. Thus an initial forecast needs to be analysed and some data cleansing might be needed to reach a more accurate and useful forecast. Arun mentioned also that outliers are reported to development teams to investigate if they were due to code changes or not.

ARIMA forecast with a downwards spike (outlier) in the first time period (graph courtesy of Arun Kejariwal)

ARIMA forecast without first time period containing outlier (graph courtesy of Arun Kejariwal)


Besides ARIMA, other models (e.g. Holt-Winters, Spline and linear regression) are applied at Twitter depending on the type of resource being forecasted, as Arun told InfoQ:

We have been exploring a large set of forecasting models. Which model to use is context dependent and directly relates to model selection problem (which is an active area of research).
In the absence of seasonality, use of linear regression is preferred as it is, relatively speaking, computationally less expensive than other models. In the presence of a non-linear trend, one can employ a quadratic model or some such. However, in the presence of trend and seasonality, selection of a forecasting becomes non-trivial.

According to Arun, Twitter’s forecasts are typically limited to a few weeks ahead for technical issues (e.g. in-house system capacity upgrades). Longer time spans are applied occasionally for some business metrics (e.g. number of users). In the near future they plan to use forecasts for elastic scaling as well.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT