BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Accelerating LLM-Driven Developer Productivity at Zoox

Accelerating LLM-Driven Developer Productivity at Zoox

50:06

Summary

Amit Navindgi discusses the systematic shift at Zoox from fragmented documentation to an AI-driven ecosystem. He explains how they built "Cortex," a secure platform integrating RAG, multi-modal LLMs, and contributor-friendly agent APIs. He shares practical strategies for driving adoption through AI champions and hackathons, emphasizing the move from deterministic workflows to autonomous agents.

Bio

Amit Navindgi is a Staff Software Engineer at Zoox, where he leads Zoox Intelligence — an initiative applying Large Language Models (LLMs) across engineering, operations, customer support, and autonomy. His expertise spans Applied AI, Observability, Semantic Search, Experimentation Platforms, Data Engineering, Frontend Development, and Oncall and Incident Management Systems.

About the conference

Software is changing the world. QCon San Francisco empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Amit Navindgi: Travel back three years with me. It is your first week at your new company. You open the docs, and it feels like reading a novel. You bounce between Confluence, GitHub, Slack, random PDFs, trying to reverse engineer how anything works. Back then, if you had a question, you had two options, read more docs, or even worse, ask another human. Terrifying times. It was normal. We all just powered through it. Fast forward to today, developers expect AI everywhere: AI in their editor, AI in Slack, in all the tools. When AI is missing, it feels broken. This talk is about that shift, not hype, not magic, but a systematic approach through the entire developer lifecycle.

My name is Amit. I lead Applied AI Initiatives at Zoox under a company-wide effort called Zoox Intelligence. My team focuses on one question, how do we make developers productive using LLMs? Everything you see today is from real production work, real wins, and a healthy number of mistakes.

Roadmap

Here's our plan for today. We'll start by looking at the developer lifecycle as it exists in a complex enterprise. Then we'll talk about the platform we built to address some of the constraints of that lifecycle. We'll take a look at some of the applications we built to address different workflows. Crucially, we'll cover how we drove adoption, because building it is only half the battle. Finally, we'll wrap with some key takeaways that you can bring back to your teams.

Developer Lifecycle

Let's start with that developer lifecycle. Let us zoom in on what a new developer actually goes through. It never starts with writing code, it starts with information discovery. They spend days, sometimes weeks, trying to piece together how things work. Are there any docs? Where do the docs live? That phase alone can burn the first month. Then, there is personal productivity. What are the internal tools available? How do internal tools work? Which dashboards matter? Who are the subject matter experts? This is not just one big blocker, it is 100 tiny speed bumps. Finally, only after that do they reach software development, their actual job. Getting to the point where they can ship meaningful code can easily take one month or more than that in some teams. The moment they ship code, they are now addressing their customers. They're answering questions, helping customers, internal users, in our case, all the internal developers. A single support issue can easily burn half a day, because information is scattered across systems. At Zoox, we decided to take a look at this lifecycle and ask one question, can AI help remove friction? Can it make some of these phases shorter, and make developers feel like they are in 2025 and not 2022. This is where we eventually landed. Same developer lifecycle, completely different experience. The rest of this talk is the journey in between. How we went from that slow, fragmented lifecycle to this ecosystem of workflows, agents, and custom applications.

The Platform

To understand that journey, we need to start with the platform we built. Before we could build the platform, we had to ask why. Why build a platform at all? Why not just give everyone access to ChatGPT and call it today? Let me ask you all something. Do you think your company's code, data, customer information is your secret sauce? That's the same constraint. We can't just paste sensitive code or customer data into a public tool. We have enterprise constraints. We need to make LLMs accessible, but we need to get there safely. We started by listing our requirements. First, obviously, everything had to be secure. Internal code, customer information, vehicle data in our case. None of it can leave our network. Then, we had to treat PII correctly. That includes employee information, rider data from our vehicles. If you've taken Zoox rides, all your data. This is a non-negotiable in an enterprise environment. It also had to be fast. A system that takes minutes to respond is not really useful to build real-time or interactive applications on top of it. Then there is the reality of being an autonomous vehicle company. Our world is not just text, it is images, videos, audio, all of the modalities. The platform had to support multiple modalities. Then, we also needed deep integrations. A generic model that only knows public facts is not very useful on a day-to-day basis. In those cases, we just let people use ChatGPT, for instance. It needs to have access to Zoox-specific systems, services, and internal knowledge to provide meaningful answers. The last one was critical, the platform had to be contributor-friendly. If only my team can add tools or integrations, we become the bottleneck. We needed a surface that every engineer at Zoox could extend. These constraints helped shape every layer of what came next.

Let's walk through how we actually built it step-by-step. Phase one was straightforward. We are an Amazon company. We started by building a gateway to AWS Bedrock that let us avoid long procurement cycles and gave us secure in-network access to state-of-the-art models like Claude. Additionally, we also had access to Nova models. This became the foundation of our platform. We call it Cortex. Then we added basic inference APIs for text, images, and videos to support multiple modalities. At this stage, Cortex could answer simple questions like, what is the capital of France? This in itself is not very useful. We don't pay our engineers to ask these questions every day. Initially, we are still setting up the foundation, so we wanted to support multiple other providers. LiteLLM made that simple. We initially started out with our own abstraction layer, converting request and response schemas, but since a few months, we've been migrating to LiteLLM because it provides a multi-provider story really well. Then we added support for Gemini models because Gemini models are better at vision tasks, and as you saw, we deal with a lot of visual imagery. We added GCP to unlock access to Gemini. Thanks to LiteLLM, that was super easy. Now we have our core inference APIs. We can talk to multiple providers securely, but it doesn't know anything about Zoox yet. Our real goal was to make Cortex understand Zoox, and this is where the magic happens. How do we go from answering, what is the capital of France, to a question like this, what does VH6 mean? VH6 is Zoox's sixth generation vehicle. Some of you have might seen it in San Francisco. A public LLM has no idea of this. The answer, as you can see here, even includes a reference to an internal Confluence document. The solution, of course, is retrieval, Retrieval-Augmented Generation, or RAG. We build pipelines to ingest data from our scattered sources like Confluence, Slack, GitHub READMEs, anywhere, any documentation that's useful to teams at Zoox rides, and then created knowledge bases out of this. We found that it's much more effective to create one knowledge base per data source. In fact, this radically reduces the semantic search space for different prompts, and gives us much better retrieval accuracy.

We now have these knowledge bases, how does the platform use them? This is where we introduce an agent. Agent is a big word in the industry right now, but it's really simple in practice. It's just a loop with an LLM and a bunch of tools it can choose from to reach a certain goal. In this case, our tools are knowledge APIs. We added knowledge APIs as well separately because people can still build semantic search applications and you don't need LLMs all the time. In this case, these tools exist, Confluence tool, Slack tool, and so on. When someone asks what is VH6, the agent's system prompt and the tool description help it reason that this sounds like a question the answer to which might be on Confluence, and then it decides to invoke only that tool. That's it. People call this agentic retrieval. There are many flavors of this. At a high level, this is what it is. It's not magic. It's simply clean tool design. This is great for static documentation, but what about real-time questions? Instead of who is on call, what about a question like who is on call for Zoox Intelligence? You can't put this in a documentation. It's real-time information that changes often. The solution is simple, we just add more tools. We added an on-call tool that talks to our internal on-call service. Now the agent has a new capability. In addition to knowledge base, it now knows who is on call for a given service. When the LLM sees a question like this, again, the tool description helps it reason that this needs the on-call tool and not a knowledge lookup. It calls the on-call tool and returns the answer. This is the pattern we follow for integrating all the other services. We could now allow integration with a bunch of tools that are internal to Zoox.

At this point, people have inference APIs, knowledge APIs, and a bunch of tools that they can build applications for. If someone wants to know, for instance, why a build failed, they simply add a new tool. Same for GitHub. If someone wants to know the details of a given pull request, you add a GitHub tool and give the agent capability to understand a pull request. Each new capability is just another tool the agent can call. This brings us to the key question, how do we make this platform contributor-friendly? Or how do we make this, or the one agent, useful for every team and different workflows? Most teams do not want to deploy and operate their own agent stack. It's cumbersome to bring up the service, have a framework to write all these tools. They just want an agent that they can access and give it access to only specific tools that they care about. We built agents as API. I think if you've used different agent frameworks like Google ADK, or Strands, they all require customers to define the agents in their own applications and figure out how to deploy them on their own. Here, we are taking all of that abstraction away and putting it on the platform. That way, people can simply invoke a REST API with a model, prompt, and a list of tools they care about giving access to that particular agent. In this particular example of an infra agent, it has access to tools that are specific to infrastructure-related issues. Teams still build tools when they need new capabilities, but they define each tool once, only in this one central registry. With a good tool description and a name, it becomes available to every agent in the company. Whoever needs it simply needs to list it in their agent call. This is something we've been experimenting with as a new paradigm of how to define agents, where customers never have to worry about deploying or running these themselves. They simply define the tool. Defining tool is something they would have to do anyway, but now they do it just once in our central registry. Whenever a team wants an agent, they just call this API and Cortex spins up a lightweight agent with that configuration. This cleanly separates responsibilities. Teams focus on their business logic, and the tools that matter for their workflow. Cortex handles execution, scaling, safety, and all the other benefits a platform typically provides. Since this is just an API, they can build any sort of application, a chatbot, a frontend or chat app, a notebook, a CLI tool, any environment. This is the state of the platform today. We have our inference APIs, knowledge bases, and our agentic API. To run this safely inside an enterprise, you need a lot more than agents and vector stores. Things like human-in-the-loop tools, rate limiting. How do you manage all the requests you get because of it being a single platform? Managed quotas, each LLM has certain quotas enforced by the LLM provider. How do you divide that up between each client across the whole company? Batch inference, evals, which of course is a really deep topic in itself. Guardrails, observability, all of these are baked into this platform, and we've worked on making sure these are useful to all the teams, and we do it safely and securely without a single client misusing the platform.

I want to show you one piece that makes a big difference in practice, human-in-the-loop tools. Here's a simple read-only tool for on-call that we just talked about. You can see there's a run method, there's a name given to the tool. These are read-only method. There's no side effects of this. It's safe to run these. What about actions that can change things? For example, creating a Jira ticket. You don't want an agent firing off hundreds of tickets accidentally, or emailing your CEO accidentally. We introduced something that is super simple where using a decorator, we make any tool a write tool. If you see this require confirmation decorator here, what this does is essentially whenever the agent decides to invoke this tool, it notices that this is a write tool and this requires confirmation. It goes from there. This is how it looks in practice. I ask the bot to create a Jira ticket. It gives me all the details it's going to use to create that ticket. Only after I confirm does it actually call Jira and return the link. This is really good in practice in the sense that read-only tools are useful to fetch information, but most of the times, people want to automate tasks that they would do themselves manually. Looking up information becomes an easier task, but there are other things like creating calendar meetings from a Slack thread, creating Jira tickets, sending emails, creating IT requests, and so on. Human-in-the-Loop tools, highly recommend following some framework. There are also many frameworks that already support this natively, but it's also really simple to build in practice yourself. This definitely prevents mistakes, protects our customers, and builds trust. Nobody wants, again, a model going accidentally rogue. We had an agent once where it just sent a message in a public channel saying to the legal team, I want rights. You don't want such things. You want that to be in a controlled manner.

This is the full platform. Each part supports a different requirement from the developer lifecycle we talked about earlier. Building this wasn't easy. We faced several challenges. Especially in an autonomous vehicle company, we have a very specific set of challenges. First, large media inputs. We have, like I said, a lot of images. If they're high-resolution images, hundreds of these requests can easily bring down your platform. How do you manage these? Without strict validation and pre-processing, the gateway can easily fall over. Same goes for videos as well. Then, vehicle imagery makes it even harder. We have our vehicles capturing images from the cities they are in. Daytime images are easier to reason about, but nighttime, especially fisheye images, are hard, even for humans. I take a look at some of these images for labeling purposes, for instance. I can't tell what's inside. It's a hard task for an LLM. Then real-time versus batch inference. Many teams want to run millions of images for data labeling, for instance. That should not go through a real-time gateway like Cortex. It belongs on cheaper batch systems that are provided by Bedrock and GCP, for instance. Part of our job is to also say, this use case does not belong on Cortex, and that's ok. As a platform provider, you should know the limitations, and then guide them to the right path. Then there is on-demand versus provisioned latency. A slow Slack bot, especially, is annoying, but bearable. Any system inside a vehicle is unacceptable. Something that's slow is just unacceptable. For safety-critical workflows, you must choose provisioned throughput and design for predictable latency. Then, my favorite topic is tools versus MCPs. You might notice that Cortex tools, it is essentially function calling, but obviously the MCPs are an evolution of function calling. I've constantly been cautioned not to migrate to MCPs yet. I think the industry is also realizing that maybe MCPs is not the right way. In my opinion, it tried to do too many things, and if it were one MCP server is one tool paradigm, I think it would have been better. Tools work fine. People can package these up however they want and make it accessible to Cortex. Finally, diverse workstream. We are an autonomous vehicle company, but we have teams doing a lot of different things. For instance, I'm an infrastructure team, so what I do is very typical to what other infrastructure teams do in other companies. We also have teams that are specific to autonomous vehicle work. Cortex today serves applications and infrastructure, fleet operations, autonomy software, and much more. Each has different expectations and constraints, so designing for all of them in mind and then bringing them into a converged platform is a really hard problem. We've figured out how to do that and where to do that using essentially isolated APIs and also a lot of training to different teams. The architecture has to be flexible. These challenges shaped what we built, but they also shaped what we chose not to build.

Applications

Before we dive into the applications, let me share the two principles that guided all of this. These two rules saved us hours, dollars, and a lot of meetings. First, build what you cannot buy. If a vendor tool exists that does 80% of what you need, buy it. Save your engineering time for the problems that your company is uniquely positioned to solve, the parts that only you can build. Then there is this Amazon principle I really like, 1 greater than 2 greater than 0. One good solution is better than two competing ones, but two competing ones are still better than having nothing. In the AI era, speed matters a lot. If building a duplicate application in one month helps a team move fast, that is fine, just converge later. To make sense of everything we built on Cortex, we group applications into two categories. One is AI workflow. AI workflows are deterministic applications. Something happens, like an alert firing in this case. Step one, the script calls AI to summarize logs. Step two, the script pages on-call. The AI is just one predictable step in a fixed chain. It does not decide what the next action should be. The second bucket is agents. An agent is non-deterministic, is the biggest caveat of agents. It has tools, a model, a prompt, and context like we saw earlier. It decides which tools to call, in what order, and when to stop. That makes agents flexible and powerful, but also harder to control and debug. Most teams at Zoox start with workflows. They're easier to reason about, easier to test, and often good enough. They only move to agents when they really need that extra autonomy and decision-making. Let's zoom back out into the application map. We have three categories, AI workflows and agents, and other things we build, and vendor tools we buy. We bought vendor tools like Cursor and Claude Code for software development. I'll talk about Cursor a lot in a bit. Again, a year ago, we had built something internally using Code Llama and Continue, which is an open-source VS Code extension, because Cursor still wasn't around back then, and we needed a solution. That's when we decided to build, because there was nothing to buy. As soon as Cursor came, I shut down that project, and then bought Cursor, and it's much better. Then teams across Zoox have built more than 50 applications across the entire developer lifecycle.

Who here uses at least one AI tool every day? Everyone. Same story at Zoox, everyone wants AI, especially to do parts of their job they secretly dislike. We focused on these painful moments, and built applications that make them easy. We won't go into all of them, of course, but I'll highlight a couple of them, which I think you can go back and build in your own company as well if you haven't already. First one is Humblebrag. For, you know that moment where you're all staring at a blank screen for 4 hours during performance review seasons, wondering if you've done anything at all? Of course, you did, you all fixed a lot of features, a lot of bugs, but your brain doesn't remember that, because it had to make room for all the new trauma you've been getting. Humblebrag remembers. It looks at your activity in GitHub, Jira, Slack, and many other sources, and pulls together everything you shipped, fixed, answered, or unblocked. It looks at your job functions. For instance, it looks at different sources for a TPM versus PM versus software engineer. Then it helps people tell their story without all the stress. It helps you write your self-review and peer reviews during your performance review season. One of the advantages of AI is that you're automating all of the lookup you typically do and spend hours doing yourself. Then, in customer support, we built ZI AutoAssist, which automatically answers questions and support channels on Slack at Zoox. Let's look at that one. If you run a support channel on Slack, you know the toil and the pain. You get constant, repetitive questions that pull you out of deep work. Context switching is the biggest blocker for productivity. AutoAssist fixes that. In the first instance here, a user asked, how do I enable AutoAssist in a Slack channel? Comes from a meta question. Zoox Intelligence, the Slack bot here, drops in and gives an answer, and people provide us feedback with these emojis. In the second instance, someone's asking a question that's specific to Zoox about CI checks, and ZI Slack bot answers that again. This is a culmination of everything we built on the platform with agent, knowledge bases, tools, everything, because it has context about all the static knowledge base as well as real-time access to all the services internally, and it knows how to answer questions automatically. The results were clear for us. Lower support load for most teams, fewer interruptions, less context switching, which is always better. Teams stay focused. More importantly, customers also get their answers much faster. It's instantaneous instead of waiting for someone to get on it.

Accelerating Adoption

That is the landscape of applications. We have tools for code, tools for support, and tools for workflows across the company. We have built the platform, and we have built the applications. The question now becomes, how do we get people to actually use them? Because again, building it is only half the story, half the battle. This is accelerating adoption. Once we talk about adoption, the next question is how? How do you actually make company use AI, or toward using AI every day in every workflow without actually forcing it? This is where our blueprint comes in. This is not just about building tools, it is also about evangelizing them. As an AI lead, I think more than half of your job should be about being a developer evangelist or AI evangelist. This is what I do now most of the time instead of building everything. You obviously build everything you need to first, and then you spend a lot of time selling it inside the company. To do that, we identify AI champions. You, of course, can't work with the whole company yourself or your team, so you need to identify AI champions in every single department you have who understand the needs of their orgs better, and they can help the shape of your platform, your AI roadmap better. We incentivize usage and overshare everything, especially the wins and the failures. Transparency builds trust, and trust drives adoption. We host team-specific sessions, based on work functions. For example, what a TPM needs from AI is much different from what a software engineer needs, so you need to have these work-specific sessions to help them learn every single thing you want them to. Then, dashboards are a big part of this. Build dashboards, track usage. This is when you get leadership buy-in. Try to show who's using AI, and more importantly, who's not using AI, based on the levels, based on the work function, everything, build these dashboards to make that easier, show who's adopting it. Then, you need to measure the impact these tools have been having, both the tools that you've been building yourself, you've been buying as well. If you cannot measure it, you cannot improve it. We want every team to see the value they are getting real-time. The most difficult problem here is, of course, how do you measure productivity? These are some of the metrics we capture today. These can be subjective, to each their own, in a way. Transitively, they represent some movement in productivity, at least we've observed that to be the case. Other than these metrics, I think one thing that does matter all the time is qualitative feedback. If developers are just much more happier using AI tools, that's a really good thing to be working on. Happy developers ship more, regardless of how much they get from the tool itself. One note on productivity as well is, it's very important to categorize them based on their dimensions. Do staff software engineers use them differently than senior software engineers? Again, do TPMs use it differently than software engineers? What models are being used? I'll share some of the dashboards we built without really revealing anything. Dashboards are very important.

Then, I spend a lot of time these days making sure people use these tools the right way. My goal is to create a 100x impact by making everyone else more productive, not by building all the applications myself. You're not going to get far with that approach. Tools like Cursor make that really possible. I spend a lot of time working with the Cursor team and all the people who use Cursor at Zoox, doing a lot of things to essentially get people to get the most out of Cursor. For instance, I run Cafe Cursor sessions, which are internal sessions where Cursor power users share all of their lessons or how they use Cursor so that others can learn a thing or two. Then we built a lot of dashboards around Cursor. One of the caveats of all these vendor tools is they showcase metrics or usage metrics that don't really tell you the whole story. Yes, number of lines accepted or number of requests made is ok, but it's not really telling you the story. You as an AI lead or someone who runs or owns Cursor at your company need to build these dashboards that connect the data from these vendor tools to your organizational data. Show breakdown, ok, who's using Cursor more? Is it, again, L5s, L6s, L3s? Which models are being used differently? For instance, I built a Cursor extension, which is an in-IDE extension, because Cursor doesn't provide this, to bridge the gap between Cursor and Zoox, essentially. If you see this, it shows people's usage data, of course, which there are other avenues of seeing. More importantly, you see the feedback form here. We want to make it easy for people to share their learnings. It shouldn't feel like an additional task for Cursor power users, for example, to share a tip or two with others. We built this extension to make it super easy right within the editor to share anything they would like to with everyone else at Zoox. We show certain alerts, you spent $50 today or $200 of usage today, what did you learn? Giving all of this within the IDE gets you more feedback. Then, all of that feedback is visible to everyone in the company to view, for instance.

These are some of our adoption analytics, where we break down adoption by work function, teams, which is also more important. Which teams use more? Is it infrastructure team? Because we typically create more pull requests than someone who's working on our autonomy software, for instance. How does that factor in? Things like that. These are Cursor analytics, which models are being used? How many daily active users? What files is Cursor being used in the most? Because people often see different performance. For example, it doesn't do that well with C++, any of the models. It's not Cursor's fault. Showing these, again, especially this one here, daily model usage trends, at Zoox, people are very curious about trying out new models. Any time a new one pops up, I immediately see usage for it going high. People come here and see, people are using 4.5 Sonnet these days, so let me just use that and see how we can get the most value out of it. That's Cursor. I recommend doing this with not just Cursor, of course, with every vendor tool you buy. We are starting to do this with a few others as well. Have a dashboard that basically connects the data from these vendor tools with your organizational data. Then, events. If there is one thing I would like you to take from this event, it is that you need to run hackathons. A hackathon is the fastest way to generate dozens of high-impact AI tools and get the entire company excited. Here's, for instance, a catalog of all the applications we built and the dashboard we put up that came out of a hackathon. We had more than 50 applications in one hackathon a few months ago. We are having one in a couple of weeks as well. Organize all these events. It's a double-edged sword. You need to organize events, but then there are people who might miss it, and not everyone gets the most out of it. You need to keep them recurring and then keep them fresh in the minds of people. If you're an AI lead, this is the single most important discipline. Resist the hype and focus on impact. It is the only way to scale AI inside a company.

Key Takeaways

What are the takeaways? Build for contributors. You need a platform that everybody at your company feels like they can contribute to and help you build more AI tools. Empower your AI champions, essentially multiplying yourself. Overshare wins and failures. It's very important to establish like, this is just not possible using an LLM. Because I get requests like, for instance, everywhere I go, it's like, can AI cook me dinner? No, it can't. It's important to set these expectations right. Again, resist the hype and focus on impact. There are new things coming every day. There's a new model, Gemini 3, for instance. It doesn't mean you have to jump on it right away. It's a weird position as an AI lead. You need to be aware of every single thing that happens in the world every single day. You also have to resist acting on that right away. We have, for instance, Cursor and a lot of internal APIs, the inference APIs I mentioned. We have Sonnet 4.5 as the default model. Just because a new model comes in doesn't mean we have to switch right away. That's where all the other very organized approach comes in, your evals and everything. Evaluate them, take your time, and then make the change. Again, if there's one thing, it's hackathon, hackathon, hackathon. You want 50 apps in a day, run a hackathon.

Questions and Answers

Participant 1: You mentioned something about realizing that MCPs might not actually be the way to go. This is actually the first time I've heard someone say that. Can you maybe elaborate more on that?

Amit Navindgi: The goal is, again, integration, even with MCPs. MCPs are important to companies that provide services, for instance, Confluence or GitHub and all these. For you to build a platform that works with your internal services, you don't really need MCPs. The reason I say that is MCPs is not just one thing. It's not just tools or MCP servers. It's a whole ecosystem of the protocol itself. There's clients, servers, tools, all of that. With MCP servers, to consume them, you need to build an MCP client. That makes it challenging. Everyone in the company now, to consume any MCP server, they now need to start building MCP-compliant clients. We already have hundreds of applications built at Zoox, for instance, a lot of frontends, a lot of Slack bots. Getting them to convert to MCP-compliant clients is a lot of work and for very little benefit. Where it does help is, for our own sake, on the platform, for instance, we can integrate MCPs. If I want to replace my knowledge base that we built with Confluence-specific MCP server that they provide, it's easy to do that on the platform side. The clients don't need to worry about it. They don't need to update their clients to be MCP-compliant. They can simply invoke REST. Every client speaks REST today. Using REST APIs is just much better. This link has some of the Cursor slash commands we use, which I highly recommend as well.

Participant 2: When you are tailoring the AI agent to your platform needs, like the Zoox Intelligence that you were talking about, what's your opinion on using an open-source model and fine-tuning that model, versus using a closed-source model, but using RAG as a knowledge base? Or do you use a combination of these two to build that agent, like Zoox Intelligence, to your specific platform needs?

Amit Navindgi: It's a different take, in the sense that I think RAG has worked well for us so far. Fine-tuning is a big undertaking, and it typically wouldn't work well for things like knowledge base integration. RAG is a proven architecture. We do have teams, for instance, Hien here supports teams which do fine-tune large language models, or generative models. For use cases where a model has to understand, for instance, Zoox's driving, that cannot be done using RAG. There are instances where that is helpful, just not for the use cases that we were dealing with, for simple knowledge base rituals.

Participant 3: You mentioned that PII is a big concern for your agents. Does Cortex scrub the PII before it sends to Bedrock, or?

Amit Navindgi: Yes.

Participant 3: Then, do you have the insights of how do you identify what is the PII that is being sent to these models?

Amit Navindgi: It's a lot of Regexes, a lot of rules. We also rely on LLM, actually, to identify if there's any PII as well. Again, we work with the security team at Zoox to come up with these heuristics that identify what these are. Also, a quick tip there is integration with your observability platforms, because typically there are certain fields that help you identify what they are. It's less of a concern now with Bedrock specifically, because it's still isolated and in our own network. Recently we started exploring if we can use Anthropic APIs directly, and there is when these become more important.

Participant 4: One other concern I've seen raised with MCP is latency or efficiency. Could you just maybe give your top three techniques or ways to mitigate latency with tool calls or calls to other agents?

Amit Navindgi: Isolated tools. I think a tool should have single most responsibility, and that's it, nothing else. The problem with MCPs is, again, an MCP server can have 10 or 20 tools. I think this is very popular with GitHub MCP, for instance. You don't need all of those tools. There is a latency part as well. There's also tool pollution, or MCP pollution as they call it. You're simply throwing more context at the LLM and giving it more options to choose from as opposed to letting it choose from a limited subset of actions it can. For latency, we have some rules that we don't allow any tools to have any external API calls that take about 3 seconds, for instance. It's simply giving single responsibility and sticking with that. GitHub MCP, for instance, we don't use, obviously, it has a lot of tools, but we don't need all of them. All we need is an ability to create a pull request or get the diff from a pull request. As long as you define these clearly and separately, should be fine.

Participant 5: You shared how the entire company is focused on being AI and sharing new tools, but how do you deal with accesses and information?

Amit Navindgi: To the information, the platform processes, you mean?

Participant 5: How do you deal with who has access to what? In terms of information, all of the different engineers can access all of the information within the company?

Amit Navindgi: No. Actually, we are still working on it, on a role-based access control on the platform level. For instance, the Slack bot here, today it processes only the information that's available to everyone in the company, but you need to have OAuth enabled on your client application so that that information gets passed from every client over to the server, and then you handle RBAC on the server side. We haven't implemented that yet, but we are getting there.

Participant 6: You mentioned like for the rest of the developers you have specific classes or workshops. Can you get a little bit more specific into what types of workshops that may be, and who you might target first, like specific groups of developers?

Amit Navindgi: For software engineers, it can be as advanced as possible with vendor tools, or even how to write user APIs. One that's very abstract is the one we run for product managers and TPMs, for instance, where we teach them how to use these no-code tools. Things like Replit, for instance, is something we are using at Zoox. These are easy to onboard because they're not implicitly technical. It's easy to teach them how to use it. They use Replit, and it's, type what you want and build applications you want. We don't really expect them to build production-grade applications, but it's enough to get them to running prototypes. We divide based on tools and work functions, essentially.

Participant 6: Do you ever break it down even more specifically to different groups of software engineers, like different types of software engineers?

Amit Navindgi: Not today, no.

Participant 7: You said hackathons is the number one thing for us to remember out of this. Do you have any tips for how you're running the hackathons that leads you to having so much success from them?

Amit Navindgi: Yes. It's important to identify different themes that you want your applications to be in. For instance, like I mentioned, Zoox has a lot of teams that do a bunch of different things. The way to make them feel included is to include a theme that's relevant to them. Then, also make it very easy for non-technical people to participate. Because in this, I think everyone has an idea, and it's easy to translate an idea to a prototype. You want them to participate. Make it easy for them to participate. One thing we are trying this time is also working with these vendors. Anthropic and Cursor, they are helping us run workshops. Workshops that teach non-technical people how to use Cursor, for instance, the same with Replit. We are having a workshop ourselves on how to use this platform for software engineers, things like that. Make them included, and obviously, give them good incentives, have really good prices. Nothing motivates more than that, I think.

Participant 8: How do you use Cortex exactly? It is used internally by different systems? They are using it for their specific purpose, or it is customer-based, some customer can use an API, or it is exposed to a customer through any UI? What does the process look like if I want to add my agent and my knowledge base to that tool? What are the quick steps that I can do?

Amit Navindgi: What is Cortex? It's essentially a microservice with a bunch of APIs exposed. Like I mentioned, there are inference APIs, chat completion APIs, text completion APIs, and image understanding APIs, which accept a payload with your message body, and then you process it and send it back. The process is actually very simple. To create this knowledge base, we have a framework, we call it connector framework, where you essentially write a Python class that says, read the data from this source and put it in this destination. The framework takes care of everything else, because the connector itself is custom to where the data needs to be pulled from. Customers have to write that connector themselves. For example, we use the same embedding model for all the data today. Again, something we are exploring. Using an embedding model, creating embeddings, storing them in vector store, and exposing knowledge base is all automated. All you need to do is write a connector. The way to create an agent, you don't really create an agent, you create the tool if it doesn't exist. If you want the agent to have a specific functionality, you add a tool for that. Again, this is pretty much what I showcased in the other tool. The HITL function is, again, a simple Python class, where there's a run method, which defines your business logic of what it should do. You can do simple automatic there, or call external API functions. That's it. Customers only create that tool. Invoking that is, again, invoking the agent using REST API.

Participant 9: Are you using the vendor-offered APIs, or do you have your own custom API and custom system for running batch?

Amit Navindgi: We are using the vendor-provided APIs. We are making it super easy for customers to do it, because each vendor has its own format of how the prompts need to be in. Again, the teams don't need to be aware of. We are enforcing a common schema for all the teams, and tools for them to do it. Then we take care of the rest, and notify them when the labeling has been done.

Participant 10: Can you tell us a little bit about the organizational changes you had to undergo in order to implement this, and the timeline? For example, is the team that developed and maintains Cortex the same team, or people that also maintain the vendor relationships with Cursor, and Anthropic, and all that?

Amit Navindgi: It's a mix. This whole initiative started a year ago, where it was just me and my manager at the time building. Software is usually built by the infrastructure team here, and I started this Zoox Intelligence Initiative. We built that team up over time, so there is a software team that builds the platform and maintains it, and onboards applications. For vendor tools, some of them are led by me and my team. Some of them are led by IT and others. Lately, how we've been targeting is a group of people from different expertise. For example, I go in, being the lead from Zoox Intelligence team. Someone from IT is part of the team, someone from procurement, someone from finance. Everyone comes together to create this cross-functional team and make it a priority to go procure this tool. As far as running adoption, onboarding, support, and all of that is concerned, it depends on the tool. Cursor, Claude Code, Replit are something I run, maintain myself. We also have tools like Google Gemini and Google Agentspace, which are more for enterprise tools and are part of Google Workspace, where there is very less for us to do. In those cases, we don't get involved.

Participant 11: You have your corporate data where everybody could view. How do you foresee that going from that public data repository to more user-specific?

Amit Navindgi: Is it in the realm of role-based access control?

Participant 11: Yes, or say I have specific access to my infrastructure cloud that others don't. How would that tool work, or how do you foresee that happening?

Amit Navindgi: This is a really hard problem to solve. It all depends on where the data is that we are dealing with and if those data sources provide access control lists or ACLs. If they do that, then it's relatively easy. The hardest part is updating all the clients. You need a common way to make all these clients authenticate and pass that user information to Cortex, for instance. We quite haven't figured that part out yet. Some vendors make it really easy to pull data with ACLs, some just don't. It becomes a challenge there. I think we found it easier to do it based on their work function, again, based on their team as opposed to user, so far.

Participant 12: I see that you have some complex inputs such as video, image, and also, I can see there's a lot of back and forth between agent and human. Are there any tips for you to manage all of this in memory, like a context, how to make the context more efficient?

Amit Navindgi: Thank God for all the scaling compute we have. It's a big challenge. We actually had a PEF, I think two months ago, where someone just DDoSed us with a lot of images. We run in Kubernetes, for instance, a new version of Cortex, and autoscaling there definitely helps. Like I mentioned, at the load balancer layer, it's important to just reject requests if they don't adhere to your payload limits. Again, you talk to the clients, figure out if they actually really need real time. We haven't had that need yet, but we will when our, for instance, like fleet size grows or something like that. In many cases, we've identified this and moved them to batch inference because it's unintentional. Most of the times, it's been after the effect. It's still a challenge for us. Scaling definitely helps us there.

 

See more presentations with transcripts

 

Recorded at:

May 14, 2026

BT