BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery

Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery

Listen to this article -  0:00

Agoda has built a multimodal content system that unifies hotel images and guest reviews into a shared topic-based structure. The goal is to connect visual content and written feedback so users can better understand hotel attributes in a consistent way across images and reviews. The system operates at a very large scale, processing more than 700 million images along with multilingual reviews in over 40 languages.

Aditya Kumar Ray, VP at Flyshop, wrote on LinkedIn,

In modern travel tech, data is no longer just about inventory and pricing; it’s about understanding content context at scale.

The core redesign introduces a shared topic taxonomy that replaces fragmented pipelines with a unified semantic layer. Previously, images and reviews were processed separately with independent ranking and retrieval logic, which made it difficult to correlate what users saw in photos with what was described in reviews. This led to an inconsistent interpretation of hotel features across modalities. By introducing topics such as Pool, Breakfast, Room Quality, and Location as shared anchors, the system maps both visual and textual signals into a common representation space.

Maps image tags and review tags into a shared topic taxonomy (Source: Agoda Blog Post)

Images are processed using classification models that generate semantic labels such as pool, beach view, and breakfast area, which are normalized into canonical topics. In parallel, reviews are processed through NLP pipelines that extract key phrases, representative snippets, and sentiment signals, all aligned to the same topic taxonomy. This enables each topic to function as a pre-aggregated multimodal package containing curated images, multilingual review excerpts, and sentiment metadata, avoiding runtime joins by precomputing associations offline and serving them through a low-latency retrieval layer.

The system is orchestrated using PySpark jobs managed via Kubeflow for large-scale distributed processing of ingestion and enrichment workloads across millions of reviews and hundreds of millions of images. The resulting topic-level artifacts are stored in Couchbase, which acts as the low-latency serving layer for production traffic.

Multimodal image pipeline (Source: Agoda Blog Post)

The design introduces a clear tradeoff between freshness and performance by shifting correlation logic into offline computation and relying on taxonomy stability. While this improves latency and scalability, it also requires careful governance of topic definitions to avoid drift across languages and domains. The multilingual normalization layer ensures consistent mapping of semantically equivalent content across more than 40 languages, which is critical for global consistency.

Agoda Engineering stated that the architecture is extensible, allowing integration of additional content sources such as structured property metadata and user-generated media into the same topic framework, strengthening long-term semantic coverage.

About the Author

Rate this Article

Adoption
Style

BT