Funnel Analysis at Twitter for Improving User Engagement
Funnel analysis is used to analyze a sequence of events to help with user engagement on a website or a mobile application. Data Science team at Twitter uses this concept to learn how users interact with user interfaces during user sign up or tweeting for improving user engagement with Twitter. They use it for analyzing user interaction logs to ensure they are delivering product features that are useful and engaging for the users.
Krist Wongsuphasawat, Staff Data Scientist at Twitter, recently wrote about their experimental visual analytics approach used to count the specified sequence and also aggregate and visualize information between steps within sequences to provide broader perspectives.
Log analysis can be as simple as counting a single event such as the click on Tweet button. But, this event only opens the Tweet composer and doesn't necessarily mean the user successfully Tweeted. Funnel analysis or counting funnels can provide a broader picture to check how often users abandon a Tweet after they start composing. This leads to more exploratory questions but the scale of Twitter data poses challenges with the data analysis. The data includes more than 10K types of events and hundreds of millions of users. Twitter team has built a unified logging infrastructure that captures user activities across all clients, which makes these logs one of the largest datasets in the organization.
The team designed Flying Sessions, an experimental visual analytics tool for funnel analysis, to support funnel exploration with less required effort and provide more information than simple counting. This tool helps the data scientists to extract insights from the log data. Users can specify the granularity of their analyses by selecting parts of the sessions. Then the tool provides the users with aggregated results that can be explored interactively in a visualization interface.
The data pipeline for the solution consists of three steps: sessionization, segmentation, and aggregation. The technologies used for the visual analytics tool include Hadoop, Scalding, D3 and d3Kit. Scalding is used to filter and summarize large size raw log event data in Hadoop, into smaller JSON files that can be visualized using a web user interface developed in D3 and d3Kit.
In sessionization stage, a session is created for each user to add consecutive events (sorted by timestamp).
Segmentation stage extracts subsequences of events relevant to the analyst-specified alignment points from sessions, and groups the subsequences into segments based on the alignment points they contain. This stage includes identifying session fragments and then extracting and grouping subsequences between alignment points.
Finally, Aggregation stage pipes the segments from the previous stage through a variety of aggregators in parallel to produce summaries (e.g., average count of events) that can be visualized in the front end. This design allows for new aggregators to be added when there is a demand for additional types of summaries.
Twitter team also envisions few design improvements in the future to add new types of aggregation that employ pattern mining or more sophisticated algorithms.