InfoQ Homepage Machine Learning Content on InfoQ
-
QCon London 2026: behind Booking.com's AI Evolution: the Unpolished Story
Jabez Eliezer Manuel, senior principal engineer at Booking.com, presented “Behind Booking.com's AI Evolution: the Unpolished Story” at QCon London 2026. Manuel discussed how Booking.com has evolved over the past 20 years and the challenges they faced on their journey to incorporate AI.
-
DoorDash Builds DashCLIP to Align Images, Text, and Queries for Semantic Search Using 32M Labels
DoorDash has launched a multimodal machine learning system that aligns product images, text, and user queries in a shared embedding space. Trained on 32 million labeled query-product pairs using contrastive learning, the system improves semantic search, product ranking, and advertising relevance. Embeddings also support other machine learning tasks across the marketplace.
-
Google Researchers Propose Bayesian Teaching Method for Large Language Models
Google Research has proposed a training method that teaches large language models to approximate Bayesian reasoning by learning from the predictions of an optimal Bayesian system. The approach focuses on improving how models update beliefs as they receive new information during multi-step interactions.
-
Scaling Human Judgment: How Dropbox Uses LLMs to Improve Labeling for RAG Systems
To improve the relevance of responses produced by Dropbox Dash, Dropbox engineers began using LLMs to augment human labelling, which plays a crucial role in identifying the documents that should be used to generate the responses. Their approach offers useful insights for any system built on retrieval-augmented generation (RAG).
-
Enhancing A/B Testing at DoorDash with Multi-Armed Bandits
While experimentation is essential, traditional A/B testing can be excessively slow and expensive, according to DoorDash engineers Caixia Huang and Alex Weinstein. To address these limitations, they adopted a "multi-armed bandits" (MAB) approach to optimize their experiments.
-
DoorDash Applies AI to Safety across Chat and Calls, Cutting Incidents by 50%
DoorDash deploys SafeChat, an AI-driven safety system for moderating chat, images, and voice calls between Dashers and customers. Using a layered text moderation architecture, machine learning models, and human review, SafeChat detects unsafe content in real time, enabling immediate actions and reducing low- and medium-severity safety incidents by roughly 50 percent.
-
AWS Hikes EC2 Capacity Block Rates by 15% in Uniform ML Pricing Adjustment
AWS has raised EC2 Capacity Block prices for ML by 15% across all regions, impacting GPU-based workloads. The uniform price hikes affect top-tier instances powered by NVIDIA GPUs, underscoring supply chain pressures and inflation. With limited alternatives, organizations face higher costs, emphasizing the need for effective workload optimization and cost management strategies.
-
Swiggy Rolls out Hermes V3: from Text-to-SQL to Conversational AI
Swiggy has released Hermes V3, a GenAI-powered text-to-SQL assistant that enables employees to query data in plain English. The Slack-native system combines vector retrieval, conversational memory, agentic orchestration, and explainability to improve SQL accuracy and support multi-turn analytical queries.
-
Benchmarking beyond the Application Layer: How Uber Evaluates Infrastructure Changes and Cloud Skus
Uber’s Ceilometer framework automates infrastructure performance benchmarking beyond applications. It standardizes testing across servers, workloads, and cloud SKUs, helping teams validate changes, identify regressions, and optimize resources. Future plans include AI integration, anomaly detection, and continuous validation.
-
QConAI NY 2025 - Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery
Aaron Erickson at QCon AI NYC 2025 emphasized treating agentic AI as an engineering challenge, focusing on reliability through the blend of probabilistic and deterministic systems. He argued for clear operational structures to minimize risks and optimize performance, highlighting the importance of specialized agents and deterministic paths to enhance accuracy and control in AI workflows.
-
AWS Expands Well-Architected Framework with Responsible AI and Updated ML and Generative AI Lenses
At AWS re:Invent 2025, AWS expanded its Well-Architected Framework with a new Responsible AI Lens and updated Machine Learning and Generative AI Lenses. The updates provide guidance on governance, bias mitigation, scalable ML workflows, and trustworthy AI system design across the full AI lifecycle.
-
QCon AI New York 2025: AI Platform Scaling at LinkedIn
At QCon AI NY 2025, LinkedIn's Prince Valluri and Karthik Ramgopal unveiled an internal platform for AI agents, prioritizing execution over intelligence. By using structured specifications within a robust orchestration layer, they enhance agent observability and interoperability while ensuring human accountability.
-
Meta's Optimization Platform Ax 1.0 Streamlines LLM and System Optimization
Now stable, Ax is an open-source platform from Meta designed to help researchers and engineers apply machine learning to complex, resource-intensive experimentation. Over the past several years, Meta has used Ax to improve AI models, accelerate machine learning research, tune production infrastructure, and more.
-
Lyft Rearchitects ML Platform with Hybrid AWS SageMaker-Kubernetes Approach
Lyft has rearchitected its machine learning platform LyftLearn into a hybrid system, moving offline workloads to AWS SageMaker while retaining Kubernetes for online model serving. Its decision to choose managed services where operational complexity was highest, while maintaining custom infrastructure where control mattered most, offers a pragmatic alternative to unified platform strategies.
-
Karrot Improves Conversion Rates by 70% with New Scalable Feature Platform on AWS
Karrot replaced its legacy recommendation system with a scalable architecture that leverages various AWS services. The company sought to address challenges related to tight coupling, limited scalability, and poor reliability in its previous solution, opting instead for a distributed, event-driven architecture built on top of scalable cloud services.