BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News RAG-Powered Copilot Saves Uber 13,000 Engineering Hours

RAG-Powered Copilot Saves Uber 13,000 Engineering Hours

Uber recently detailed how it built Genie, an AI-powered on-call copilot designed to improve the efficiency of on-call support engineers. Genie leverages Retrieval-Augmented Generation (RAG) to provide accurate real-time responses and significantly enhance the speed and effectiveness of incident response.

Since its launch in September 2023, Genie has significantly impacted Uber's support teams. It has answered over 70,000 questions across 154 Slack channels, saving approximately 13,000 engineering hours with a helpfulness rate of 48.9%, as measured by its users.

Uber's on-call engineers often spend significant time answering repetitive queries or navigating fragmented documentation, making it difficult for users to find answers independently. These circumstances led to long response times and reduced productivity and were the driving motivation for building Genie.

Uber used Retrieval-Augmented Generation (RAG) to power Genie. RAG is an innovative method that combines the strengths of information retrieval systems with generative AI models to produce accurate and relevant responses. It allowed Uber to quickly deploy a solution by leveraging existing knowledge sources, eliminating the need for extensive example data that an AI model fine-tuning would have required.

Genie pulls data from various internal sources, such as Uber's wiki, Stack Overflow, and engineering documents. The information is scraped, converted into vector embeddings using OpenAI models, and stored in Search In Action (SIA), Uber's in-house vector database. Genie only ingests pre-approved data sources with no sensitive data to avoid leaking sensitive information.


Genie's overall architecture (source)

When a user asks a question in Slack, the query is translated into an embedding, which Genie uses to fetch contextually similar data in the vector database. It then inputs this data into the Large Language Model (LLM) to generate an accurate response based on the retrieved information.

Uber has implemented a metrics framework to improve Genie's performance through continuous real-time user feedback. After Genie responds to a question, users can provide feedback by selecting options such as "Resolved," "Helpful," or "Not Relevant."


The flow of user feedback for Genie (source)

This feedback is collected via a Slack plugin and processed using Uber's internal data streaming systems, sending metrics into a Hive table for analysis. The feedback loop allows Uber's teams to track Genie's helpfulness and refine its responses based on real user experiences.

For performance evaluation, Uber designed a custom evaluation pipeline that assesses various metrics, such as hallucination rates and the relevance of responses. This pipeline processes historical data, including Slack metadata, user feedback, and Genie's previous responses. It runs these through a scoring system powered by the LLM, which acts as a judge.

Uber has also incorporated a document evaluation process to ensure the quality of the information Genie retrieves and uses in its responses. The system transforms the scraped knowledge base into a structured format where a row represents each document.


Workflow of the document evaluation app (source)

Genie assesses each document's clarity, accuracy, and usefulness by feeding these documents into the LLM with a custom evaluation prompt. The LLM then returns a score and provides actionable suggestions on improving each document. This process helps maintain a high standard for the underlying documentation, ensuring that Genie's responses remain reliable and effective.

About the Author

Rate this Article

Adoption
Style

BT