LinkedIn recently published how LIquid, its graph database, automates the indexing and real-time access of all connections to members, schools, skills, companies, positions, jobs, events, etc. This knowledge graph, known as the Economic Graph, has 270 billion edges and growing, currently handling a workload of 2 million queries per second.
LinkedIn migrated its "People You May Know" (PYMK) recommendation system to LIquid from the legacy GAIA system. This change significantly improved queries per second (QPS), latency, and CPU utilization. The QPS has increased from 120 QPS to 18000 QPS, latencies have dropped from over 1s to under an average of 50ms, and CPU utilization has decreased by more than 3x. LIquid also introduced new database indexing techniques that allow real-time data querying, enabling up-to-the-second recommendations.
The figure above illustrates the architecture of the system. This architecture employs LIquid to answer graph queries with short latencies and acceptable hardware costs. A second ranking function is applied based on the hundreds of candidates generated through LIquid queries on the Economic Graph. This ranking function uses machine-learned features from Venice and analytics insights from Apache Pinot to score and select the top candidates. The filtering step prepares this ranked list for rendering and a final scoring.
LIquid's design allows it to scale up to ten times its current size, accommodating organic growth and new semantic domains for LinkedIn's 930+ million members. It provides 99.99% availability and automatically scales to accommodate an increase in the graph's size and activity volume.
The graph database uses a composable and declarative query language based on Datalog, enabling developers to access and utilize data efficiently. A composable language allows developers to build on existing features (called modules), a declarative language lets developers focus on expressing what they intend to develop, and LIquid automates efficient access. This setup allows developers to quickly change a dataset, significantly reducing the time to adjust and update databases.
Bogdan Artintescu, director of engineering at LinkedIn, describes the roadmap for LIquid:
The roadmap to enable members to do anything requires increased sophistication in answering questions for our members. This can be improved along two main axes. First, the complexity of the query and the variety of the data sources added to the Economic Graph will enable new features to be developed and surfaced. Second, enriching the data will improve the ability to reason over it. This can be achieved through creating derived data (either through deterministic algorithms or probabilistic machine-learned methods) or improved reasoning through richer semantics in the Knowledge Graph (KG) schema. We plan to focus on both high-performance graph compute and analytics and building a KG ecosystem to enable our developers to further enhance our member experience.
The successful implementation of LIquid has inspired other teams within LinkedIn and sister teams at Microsoft to adopt it as a graph index.