Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking

Swiggy detailed the architecture of the company's real-time machine-learning ranking system for autocomplete search suggestions, describing how the platform combines OpenSearch retrieval, feature stores, and learning-to-rank models while operating under strict latency requirements. The system replaced a hand-tuned heuristic ranking approach with a learned ranking model running directly inside OpenSearch, avoiding additional services or network hops while improving autocomplete relevance.

According to the company, autocomplete requests are particularly sensitive to latency because every keystroke can trigger a new search query. Traditional autocomplete systems, therefore, tend to rely on lexical matching and static ranking rules optimized for speed. Swiggy's newer approach separates the workflow into two stages: candidate generation and ranking.

When a user begins typing, the system first retrieves a broad set of candidate suggestions using OpenSearch lexical retrieval combined with embedding-based similarity search. This retrieval layer is optimized for recall and fast response times. The candidate suggestions are then passed into a ranking layer where machine learning models reorder results based on predicted relevance.

The ranking system incorporates real-time signals such as user interaction history, click behavior, query context, and item popularity. These features are combined with offline-trained models that are deployed for online inference. A feature store is used to serve both precomputed and streaming features, enabling the system to avoid expensive real-time computations while still reacting to recent user behavior. The ranking layer is built using a learning to rank approach integrated with OpenSearch, typically implemented using frameworks such as OpenSearch LTR, with model families like RankLib and gradient boosted tree methods such as XGBoost used for ranking and re-ranking tasks.

The autocomplete platform also includes a continuous feedback loop that retrains ranking models using live user interaction data. Click-through rates, conversions, and ordering behavior are streamed into offline training pipelines where updated ranking models are generated and stored in a model registry before deployment into the online ranking service.

Typical ML ranking service vs OpenSearch LTR latency (Source: Swiggy Blog Post)

The architecture is designed to operate under strict performance requirements. Autocomplete requests are highly interactive and require low-latency responses, which leads to design choices that favor lightweight models and optimized inference paths. Rather than relying on complex deep models in the online path, the system balances model complexity with serving efficiency to maintain responsiveness at scale.

Training and deployment setup for the Autocomplete ML model using Opensearch LTR (Source: Swiggy Blog Post)

The system also includes a feedback loop where user interactions are continuously collected and used to improve ranking models. Click-through rates and conversion signals are fed into offline training pipelines, enabling models to adapt to evolving user behavior and emerging query patterns. This enables the autocomplete system to adapt to new trends without manual rule updates.

According to Swiggy’s engineers, the design integrates machine learning into a traditionally rule and retrieval-driven component without compromising latency. The separation of candidate generation and ranking allows each stage to be independently optimized, while the use of feature stores and streaming pipelines ensures consistency between training and serving environments.

About the Author

Leela Kumili

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Leela Kumili

Rate this Article

This content is in the Search topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter