InfoQ Homepage Presentations Choosing Your AI Copilot: Maximizing Developer Productivity

Choosing Your AI Copilot: Maximizing Developer Productivity

View Presentation

Speed:

Download

40:55

Summary

Sepehr Khosravi discusses the current state of AI-assisted coding, moving beyond basic autocompletion to sophisticated agentic workflows. He explains the technical nuances of Cursor’s "Composer" and Claude Code’s research capabilities, providing tips for managing context windows and MCP integrations. He shares lessons from industry leaders on shrinking process time beyond just writing code.

Bio

Sepehr Khosravi is a software engineer at Coinbase working on machine learning infrastructure, and an instructor at UC Berkeley where he teaches courses on generative AI and rapid product development. He's also the founder of AI Scouts, a free program teaching students how to build AI-powered apps from scratch.

About the conference

Software is changing the world. QCon San Francisco empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Sepehr Khosravi: Before we go into maximizing developer productivity using AI today, I think it would make sense for us to get a gauge of where we are today with AI. It's going to be a little survey we're going to take to see how far into AI productivity we are as a group.

First question. What level of AI-assisted coding best describes you? You can say none if you use little to no AI. Beginner if you occasionally use some ChatGPT or Claude. Intermediate if you're regularly using AI for your copilot. Advanced if AI is your default and you mainly are reviewing and iterating on the code it produces. Then expert if you're building full-on AI workflows, agents, and tool integrations. It seems that most of us here are intermediate, around 60%, which is pretty good. Only 2% are at none, which is awesome to hear.

Moving forward to the next question, what percentage of your daily coding is generated by AI, would you estimate today? I'm seeing 25%, 50%, 75%. This one is lower than I expected from the other one. Around 50% of you seem to be having 0% to 25% of your code generated with AI.

Then, last question, this one's pretty open-ended. What developer productivity tool do you use the most? You can have up to five responses here. Just go ahead and put whatever tools you're using the most, and we'll see a word bank start to pop up here. Seeing a lot of Copilot. The bigger the word gets, that means the more people wrote that one in. I'm seeing Copilot, a lot of ChatGPT, Claude, Cursor, GitHub, Cider, Astra. Somebody just said code. That works. Not as wide of a variety as I would have thought. It seems like most of us are on Copilot. It makes sense.

I want to compare where we are today to the latest survey. Stack Overflow did a survey on AI tools in the development process. What they found out is that roughly one in three engineers use AI code less than once a month, which is a lot higher than I expected. That's one in every three people that you might see here aren't using code at all. I think from what we saw, we're a little bit over that. Also, there might be some bias because it is a Stack Overflow survey. People who are using Stack Overflow may be using AI a little bit less. This is the best data that we had out there. What's also really interesting is that although AI usage has continuously been going up over the past three years, sentiment has actually decreased this past 2025.

In 2023, 2024, sentiment was above 70%. This year in 2025, we've gone down to just 60%, which is interesting because, as we know, the AI tools are the best they've ever been in 2025. Why is that? I think a lot of it is due to headlines like this and a lot of CEOs coming out and making really bold claims about AI. Zuckerberg went on to the Joe Rogan podcast and talked about how AI will replace mid-level engineers soon, maybe by the end of 2025. Then because of that, I think naturally there's this hype that gets created on one end. Then there's a counter reaction to that, where people start to say, no, this isn't the case. AI coding is a little bit overhyped. It's not all it was planned to be. The pendulum swings this way. I think that's where we're at right now, with a lot of people being a little bit apprehensive with using AI to code. The reality, like most things, usually lies somewhere in the middle.

Outline

For the agenda, we're going to talk a little bit about the current state of developer productivity, what realistic gains can you expect to make. Then we'll talk a little bit about choosing your AI copilot. Then, finally, go into two of my recommended tools, which are Cursor and Claude. Then I'll share some lessons that I learned from the Databricks CEO.

Current State of Developer Productivity

Developer productivity, first off, this is another long-term research that was done on over 100,000 employees by Stanford. They wanted to try to see how much productivity gains are actually being made in code that is written. I won't get into the exact methodology that they had. They didn't just measure commit numbers or lines of code changed, they had a panel review the code and try to actually understand what level of productivity was gained from that code, not just the number or the quantity. What they found out is about 30% to 40% more code being generated by using AI.

However, they also realized that 15% to 25% of that was code that ends up getting reworked, whether it has bugs or ends up getting deleted later on. They estimated that net overall software engineer productivity gains about 15% to 20% productivity gains from AI. I think that number can be even higher if you learn how to use these tools. I think at a bare minimum, that is something that you can expect to gain from these tools. Going to look into the tools, I think there's really three categories of tools.

The first category is the all-in-one, non-developer friendly tools where anybody can use. I think with this category, we really do have 100x in productivity. This is where I spend a lot of my time teaching at UC Berkeley. I also run a nonprofit for kids to teach them how to use AI tools. We see people have zero experience coming to these courses, build a business, and start making thousands of dollars off of software they wrote. Is it the most insane software? No, but it does its job, and it makes some money. Same with kids. We have 11-year-olds coming in and building apps that their friends start using, and they gain a little bit of real users on. There really is a 100x productivity gain here where things you just weren't able to do before, people are now able to do, which also may attribute to a little bit of the AI overhype, whereas we don't see that same 100x gains for developers. We still do see the gains. I would split our developer tools into two segments.

One is the IDE layer, where it's tools built on top of foundational LLM models, things like Copilot, which most of you are using, Cursor, Windsurf, IntelliJ, Cline, and most recently, Google Antigravity that came out. Then, on the other end, we also have a tier of tools that are terminal-based CLIs. These are typically made by the foundational models themselves. Claude, ChatGPT, Google, Kimi, they make their own models in this CLI format. I'm going to go over a lot of stuff that maybe a lot of you might already know since we are at the intermediate level. Hopefully, all of you can move up one step here, and by the end of it, pick up at least one CLI tool and one IDE tool if you're not already.

Choose Your AI Copilot

Getting into choosing your AI Copilot. As we talked about, tons of tools. Most people are using Visual Studio Code, from that same Stack Overflow survey, about 75%. What was really interesting is they went one level deeper. They asked those people who are using Visual Studio Code, what tool do you want to work with in the future coming up next? The top responses were Claude Code, Cursor, IntelliJ IDEA, and Neovim. The two that we'll go over are Claude Code and Cursor, not because they were chosen by the survey, but because I also think those are probably some of the top tools out there right now.

Cursor

We're going to do a speed run of top 10 Cursor tips, so that if you've never used Cursor before, after this session, you can download it and be a pro. Starting off with tip number one, tab. I think 2% of the people in here said that they use zero coding. For those people, I would really recommend you start with this tab. Cursor has built its own specialized custom model on this. It's really good. A lot of times, you will type 10 to 20 lines of code by just hitting tab, not lifting a finger.

It gets suggestions based on your recent changes, your linting, and accepted edits that you make. It's really great. If you hate AI, just download that. Let it show you what it's going to generate and try it out from there. Two is the Cursor Agent. I'm sure a lot of you have seen this and used this. What's really great about this is you can choose what model you want to use with the Cursor Agent. You can try Gemini, ChatGPT. What I really love about this agent is all of the tooling that it comes with. It can read different files. It can search the web. It can apply stuff to your terminal and have MCPs. All of these tools is what really makes its agent so great. A new recent feature that they launched, which is also really awesome, is this multi-agent mode, where now you can type in one prompt, and it will generate three, four, however many different occurrences you want to the same prompt response.

Actually, I went ahead and did this for a couple of the most popular models to see what they would generate. First up, I have Composer. Some of you might not have heard of Composer before. Composer is an LLM that Cursor made themselves. While it's not as good in code quality, maybe, as some of the other top-tier models, what it really specializes in is speed. A lot of changes you're making on Cursor, it's just simple changes where you don't need that smart of an AI. This really helps. This generated this output. I asked them all to generate a landing page of a MacBook M5 Pro being released. Composer did this in 17 seconds. In comparison, Claude Sonnet took about a minute, and this is what it generated.

Then, finally, ChatGPT Codex took about two minutes, and this is what it generated. This is a small sample size. It's one prompt, so this is not a full study. Just to give you a general idea of what these prompts might look like and how much time it will take. If you put them all side by side, this is how it looks. You can take your pick at what you like the most. I might like Composer the most out of all of them. Then, just for fun, since Gemini 3 came out, I tested that one as well. This is what it generated in 34 seconds, which might be the best design, but we'll have to do more testing to see how good it really is.

Tip number four is Shift Tab. By default, your agent mode is in agent mode, but if you hit Shift Tab, you can switch it to ask or plan mode. These are really helpful as well. If you're trying to just understand your codebase and not make any changes to it, or maybe just use AI as a thought partner, you turn on ask mode. You start chatting with it. You can even have multi-agent chat mode, so you're having different experiences. Gemini is giving you some advice. ChatGPT is giving you some advice, seeing what different ones say.

Then you can go ahead and also plan mode, where once you know what you want to do, you type it in, and Cursor will generate a plan for you. This will be a README file with all of the steps that it's going to take. It's going to implement every step, test it, and keep going forward. Before it goes into it, it's going to give you a chance to review its plan. This is really good for these really complex tasks that you're going into. One more thing that I really like with all of these is when you integrate it with multi-agent, I love when a new model comes out, shadowing that model with the previous one that I was using.

For example, right now, I'm liking Composer mixed with Claude a lot. Gemini came out, and I'm wondering, is it better? What I'll do is I'll have all of my prompts generating twice, once with my regular LLM, and one time with Gemini. I'll try it out for a week, see which one I like better, and switch over from there. I think that's a really good way of testing every time these new LLMs come out, which one you like the best. For the plan mode, you can see this is what it's going to generate. It's going to generate a checklist, and you'll see in lifetime as it goes through it, checks off everything that it completed. Again, really good for complex features.

Tip number five is turn on Cursor Sound. This one might be underrated. A lot of people don't know about this one. The biggest pain point of producing code with these LLMs is the wait time. You'll put in a prompt, you've got to wait two minutes, and then you forget about it, you come back. In fact, there's so much of a problem that YC recently funded this company called Clad Labs, where they have a brainrot IDE, and it will let you play video games and watch TikToks while you're waiting for your code to generate. This company raised a ton of money for this. You can tell it really is a problem. I wouldn't recommend this as my go-to IDE. Just turn on Cursor Sound instead. Tip number six is Custom Commands.

Another thing that's awesome about Cursor is commands that are repeatable, you're using a lot, you can go ahead and make custom README files. For example, if you want to create a PR and you have a certain format you're always creating your PRs in, you can create a README that specifies that. Then instead of having to tell Cursor the same thing every time about how you want your PR format to look like, you just type in /, you put in a command, and Cursor has the context on that. Similar to commands is rules. Rules are a little bit different. They're not markdown files in Cursor. They're .mdc files, which are a little bit different. What this allows you to do is give a description of each rule and also give it a rule type. You can choose if you want your rule type to always apply. If you click on this, every chat you have, that rule will apply to it. This would be good for things like if you're asking your code to not generate comments. You don't like your LLMs generating comments, you will put this so always it knows, don't generate any comments for me. Then we have the intelligent apply when the agent itself decides when it should apply it. Based on the description that you give to the rule, the agent will go and read that and say, for this task, is this a good rule to use or not? Sometimes it works. Sometimes it doesn't work the best.

The other one is apply to specific files. You can put it in a specific folder. When certain files are being touched, that's when that rule is going to activate. Finally, you can apply it manually. That one basically just becomes commands. It's almost the same thing. If you're applying rules manually or creating commands, you'll have to type in add instead of a slash and tell it, this is the context I want you to add to this prompt right here.

Another thing that's really awesome about rules is they have project-level rules, user-level rules, and team-level rules. I highly recommend sharing these with your teams. The ones that are very specific to you, you keep it on your local. There's other ones, for example, creating PRs in the same format. It's great to share with your team so you guys aren't all creating the same rules over and again. What's also really great is this AGENTS.md format, which is basically the README for agents that has become a convention for a lot of these coding tools, tools such as Codex, Cursor, Gemini CLI, Copilot. A lot of these are adopting this AGENTS.md. What this allows is these rules that you create will work across all of these different agents. Instead of each of them having a different format, then you're having to copy the same rule into different formats and whatnot. The sad part is Claude Code doesn't support this yet, but hopefully sometime soon.

Then tip number seven is just Cursor Rules. Good rules, it's similar to prompt engineering, but the main thing is the context that you give it, and just instructing it how you would like your regular documentation, giving it all the details. Just keep it short, under 500 lines. If it's something that's bigger than that, split it. You can nest rules. You can have one rule that references another rule within it. Go ahead and do that. Give it concrete examples and avoid anything vague.

Some example of Cursor Rules, one could be like a refresh.md, where you have some specification, where if a bug persists, you give the AI some specification to search all over the codebase and try to dig a little deeper into each area. Sometimes this really helps when you're stuck on a bug. You can do a no-comments.md, where you can say, don't add any extra comments. I don't like the comments. Or you can do a prd.md, where you just want to generate a PRD. You have the format that PRDs are generated at your company. You put that in there, and a lot of the time that Cursor saves is actually for things that aren't even coding, like this.

Tip number eight, MCPs. These are great. You can gain so much more functionality when you take the time to set these up. I think this is where you make a lot of that expert level gain when you set up these MCPs. I highly recommend taking the time to try to set this up. One caveat being, there is a maximum. You can have 80 tools maximum on Cursor. Even that, I wouldn't recommend having 80 tools. You really start to see the models deteriorate when you give it too much context, and it has trouble figuring out which tool it wants to use. Don't add too many. Or if you do add a lot, make sure you turn them off whenever you don't need them. The top MCPs that I would recommend, number one is document store. I think this is the biggest one by far. This helps so much. There are so many gaps in the code that is written in documentation that AI won't understand, and only you know. Once you hook up either Confluence, Google Docs, whatever you have your documentation stored on, this is a really big unlock.

The second one is version control. This is basically just GitHub, which is obviously great to have. Third is any project management tool, whatever you're using, Linear, Asana, Jira. You can automatically grab the tickets, have it solve it, or you can create tickets. Saves a lot of time as well. Another one that's helpful is having any sort of database MCP, and preferably having it set to read only so you don't end up wiping out your database by accident. Things like Snowflake, Supabase, if you need to query something, understand something from your database real quickly, this is also very helpful. Any observability tools you have as well, Datadog, Prometheus, whatever you have. When you're trying to debug, having access to these logs really helps the LLM as well.

Another thing is manage your context window. I think a lot of times when I see people saying this AI isn't working well for me is because they open one Cursor Agent chat, and they start typing in it. They type a little bit. Next, they come type the same thing in it. Now this agent has so much context in it, and it just starts to deteriorate. Whenever you're switching tasks, make sure to open a new agent so it has fully fresh context. Because it really does have a big impact on it. Prompt engineering is important, but much more important than that is just context engineering. What context you're giving your LLM matters a lot more than the specific way you're formatting what you're saying to your LLM. Tip number 10, Cursor Checkpoints. It's a pretty simple one. We already have Git control. Sometimes you have a chat that's doing really well. Then you give it the wrong prompt, and it completely sidetracks into the wrong direction. Just good to know that you can restore back to a previous point in the chat and continue from there.

Another thing that I love about Cursor is it indexes your codebase. Immediately when you download your codebase, 80% of it at least gets indexed. Then from there, when you add new files, it'll adjust. When you delete files, it'll adjust. Sometimes for really large or complex files, it'll leave it out just for the sake of performance. Tip number 12, Cursor Slack integration is awesome, or any type of AI Slack integration is awesome. For these really small things where maybe you make a PR, somebody comments on it, you forgot to change the styling format. Or they're like, can we update this config variable? Makes it a lot easier. You just tag Cursor, and you don't need to go into your codebase to do it for simple changes like this. Tip number 13 is Cursor Browser. They recently launched this one too, where you can actually see your app live next to your agent, which is great, because now it has access to the console logs and the network traffic. It can actually test your applications for you.

Then tip number 14, use at your own caution, there's YOLO mode, where it's auto-accept. You can tell the AI, accept whatever changes I make. You probably want to stay away from this, but some cases that I've seen it useful for is when you want to write tests, you can tell it, write some tests, then code, then run the test, see if it works, and then iterate, and you can get a bunch of tests generated by the AI just looping back and forth testing your app.

Claude vs. Cursor: A Real-World Comparison

We've gone through Cursor, and Cursor is just so great, why do we even need Claude Code? I want to share a real-world example that I had of Claude versus Cursor and where these two tools shine in different areas. I can't get into the specifics of what I was doing, but I was looking to implement a feature, and I gave the same basic prompt to Claude Code and Cursor. What Cursor did is it just selected a single solution. It told me about it, and then it executed on this non-optimal design.

Then I tried Claude Code, on the other hand, and it searched the web for open-source repos. It presented three different options for me with pros and cons, really high-quality analysis, and it saved me a ton of time. I think that's where Claude Code really shines in comparison to Cursor. For small changes, Claude Code actually sucks. It will over-engineer things, research it too deeply. When it comes to complex features and research, Claude Code is a lot better. It does burn a lot more tokens, but I really think it is worth it. If you have some big task, hard design you're trying to implement, I think Claude Code should be your buddy that you're chatting to. Then Cursor, on the other hand, is really good for those quick outputs using the Composer LLM. If you want to try different LLMs to see if one doesn't work, if another one might have an answer for you, Cursor shines there again. You have all the visual bonuses that come along with it as well.

Then section on Claude Code, a lot of it is similar to what we talked about for Cursor. I'm just going to go over the core four items that you probably need to know for Claude, which are skills, subagents, commands, and plugins. Skills, similar to rules, the ones that we marked, we had MDCs for rules in Cursor, and we could check if they're auto-applied or not. Skills are basically those auto-invoked rules that we have in Cursor, where if Claude should note something when x comes up when we're discussing y, that's when you would use a skill. Subagents are explicit workflows. If you want some specific task to be run, that's when you're going to use a subagent. What's really cool about subagents, which we'll hop into later, is you can give them specific access to different MCPs.

Then we have commands, which we just chatted about, same thing on Claude Code. Finally, plugins, this is their way of distributing packages. You can bundle together your skills, your agents, any commands you have, and then make that a plugin and other people from your team or outside of your team can download that and use it. Starting off with skills, again, similar to rules, but a skill that I have up here is converting a blog to the template that I wanted. I would have a rule here of, if I'm generating a blog, how should it look like to match my company format? It would make that conversion for me. Second is commands. A command might be a PR command where I tell it, just create a PR for me so I don't have to type in the three or four commands it takes to make it.

Then, three, and this is the fun part, is subagents, which I think is really one of the big benefits of Claude. We have our main Claude agent in our terminal chat, but then we have subagents that it can call. For example, if you're having a page, you can have a PagerDuty subagent that has MCP integrations with Slack and Datadog, and checks what page has been called, goes and investigates the logs, and tries to find the root issue for you. Or you can have a documentation subagent where whenever you make some PR, you can tell it, go ahead and update our docs according to these changes as well. Or you could have a Karen subagent where you can ask it to go through Slack and Jira and see if everybody finished their task or not, and then eliminate or notify the users.

For subagents, you want to use these when it's a very specific purpose that it has, and you want it to have specific access to certain MCPs. What's really great about these is they have their own context window. It doesn't pollute the context of your main agent. They have their own context windows. Then, finally, plugins. Just a quick overview of plugins. You can bundle together your agents, your skills, and commands all together. It's a visual representation of what you can have, and other people can come and download it.

We've gone through Cursor. We've gone through Claude. Are these the best two tools? For me, personally, yes. It's so head-to-head. It changes just about every week. It really depends on personal preference at the end of the day. I think two other ones that I would say are really close to these are Cline and Codex. Cline is a little bit cheaper than Cursor is. I prefer Cursor because of the Composer model that it has itself and the indexing that it does on its codebase that Cline does not. I've heard a lot of people saying, I get way better results when I'm on Cline than Cursor. Try both of them out on your own. See what you like better. Same with Claude Code versus Codex. I've heard a lot of people say, I get better results on Codex than I do Claude Code. I personally think Claude Code is a little bit better at knowing when it's wrong and not confidently saying some answer. I feel it goes a little bit deeper in the educational aspect of it will explain what it's doing to you better than Codex does. Again, I don't have any statistics on this. This is personal experience. You'll have to try it out on your own and see what you like better.

Then, finally, Antigravity from Gemini. I would highly recommend this one, too. This one's really cool, because this is the first time we're seeing a foundational model build an IDE. Now you have access to Gemini on the foundational model with Google Antigravity. They're also allowing you to tap into ChatGPT, Claude, and other tools. I feel like this might be one of the best ones because of this reason. It's the first of its kind in this way.

Bonus - Documentation, PR Review, Evals, and More

A couple of bonus categories, actually, even beyond coding. I've been in industry for roughly about a year now. This, I think, has helped me more than anything. Having AI for not only docs, but just explainability. So many times, there's just documentation lacking at a company, and I'll just use Claude Code to walk through it, chat with me, and explain the codebase to me. That really helps me in areas where otherwise I might have had to ask other people for help. One tool that's really awesome is DeepWiki. It's AI docs for any repo. They have over 20,000 repos on the web already indexed. I was working on a project that had zero documentation for their open source. Not zero, they had one page. It was quite terrible. This DeepWiki, it wasn't perfect. It really helped me out in building my project with these AI-generated docs. It has a chatbot next to it where you can ask questions to the documentation as well. I highly would recommend this. Second is AI Code Reviewer. I haven't gone too deep into any of these. I've heard that CodeRabbit is the best. Just in general, having an AI Code Reviewer does save a lot of time. It can help you catch some of those small syntax errors, styling errors. Or if you have specific formats on certain PRs and certain checks you need to make, these are great for that.

Another thing is low-code tools. Part of our responsibility as developers is to try to help out our non-developer friends at work. Introducing them maybe to Cursor, but more likely to tools like Lovable and n8n, because this is really like a superpower for them. If you can just introduce it to them and give them a little tip, they're going to love you forever because of the things that they're going to be able to produce off of this. I highly recommend that. Quick show of what n8n is. n8n is basically a workflow. You can build AI automations and workflows on n8n. What's really awesome about it is it's very low code. They have integrations with just about any app you can think of. You can hook it up to your email. You just click on that node, and it's a dropdown box where you click different variables. No code really necessary. If you do want to build something more complex, they have these JavaScript and Python code modules that you can put in there. You can actually go in and type code. Again, for non-technical business people, this helps them set AI agents up pretty easily and has a lot of impact.

Another thing I want to go over is evaluating impact. How do you find out if AI is actually having a productive impact on what you're doing and what your company is doing? I've seen a lot of people try to figure out the right metric for this. I think the conclusion is there really isn't any. What's more important than everything is just finding different metrics that you can track so that you can reference them later when you need. A lot of times, you will have a story that you can tell through your qualitative experience. These metrics just will help you back up that story when the time comes that you need to present it.

Finally, evaluating costs. I didn't really go into this too much, partially because a lot of you will be spending on company money anyways, but also because I think most people are underutilizing AI right now. It just makes sense to overspend. Go ahead, invest highly into it. Overspend for the first six months. See what the results and the gains are. Then from there, adjust and cut back if you need. A shoutout is the Kimi model is particularly good for low-cost, high-quality outputs. However, if you're using the Kimi model with Cursor, the way they do their tool calling is different, so you won't get its full capability. I wouldn't recommend it on another LLM other than its own CLI.

A couple other things I wanted to touch on, from that Stanford research paper we talked about earlier. These productivity tools are going to be different depending on what you're working on. Especially on greenfield, low-complex security tasks, you want to be using AI almost every single time. That's where the majority of the productivity gains are made. If you're working in brownfield, high-complexity tasks, sometimes you might want to ditch it. It might not be that helpful. You can try. It really depends case to case. Another thing is also language popularity. Python, Java, these highly popular languages perform a lot better. If you have some older, less popular languages, it's going to be hard to create anything.

Beyond Writing Code (Lessons from Databricks CEO)

Then I want to go beyond writing code. I got to be a speaker at this event for a bunch of CEOs. As part of that, CEO of Databricks, Ali Ghodsi, was also there. He shared a story that really stuck with me and I wanted to share with you guys. He talked about they build these connectors at Databricks. Typically, it takes them four quarters to launch one of these connectors. Some new AI tool came out. I don't know what it was. He said there was this new AI tool. He went home, tried it, and in about a day, he was roughly able to get that connector working, not 100%, maybe 80%. He's like, this is great. Took the tool, passed it on to his teams. He's like, "All right, guys, let's cut down the time". They went, researched the tool, came back. They're like, "Yes, it's good. We can cut down from four quarters to three quarters". He was like, I don't really understand why. I tried pushing back on it a little bit. They were like, this is just what it is, because of XYZ reason, we're not going to be able to do it. He's like, I gave up. He wasn't really sure, but they gave up.

Then, he had this one German employee that he talked about that came in and completely revamped the whole process. Through a couple of things that he did, he was able to take the four quarters for one connector time down to seven connectors in one quarter. That's 28x in productivity. What are the takeaways, and what did he say? Top thing is that people are just people. We're all humans at the end of the day, even the best of us engineers. We're resistant to change. One thing that's really good when you're trying to make these evolutions in AI is bring in a fresh set of eyes and have them re-evaluate assumptions. A lot of teams, we might not want to make a change within our own teams, because it's going to cause us a lot of work. Know what? Somebody else is coming and making it happen. They don't care how much work it's going to cost you. In the end, that's going to lead to more productivity for you.

Number two, he said there's yaysayers and naysayers. He didn't bash on the naysayers at all. He said they're both right. They both come up with logical reasons. That makes sense about why you should or shouldn't do something. However, with that being said, he said, I've learned almost every single time if it has to do with pushing boundaries in AI, find the yaysayer and put them in those positions to lead. Why I share that for you is for the few of you who maybe aren't using the AI the most, if the CEOs out there are saying that they're going to put the yaysayers in the positions of power, even for your own personal gain, you might want to consider, even if you don't think it's the best option, exploring it a little bit, because that is what your management likely wants.

Then three is they treat software as actual software. He said the cost for coding is lower now than it has ever been. Sometimes it's the other tasks that are taking up a lot of time. What happened in this case, actually, they found out that about 80% of the time for building these connectors was PM work. It wasn't the actual software work. That's how they were able to cut this process down, by cutting out a lot of user interviews, cutting out writing the PRDs. They just took a risk. They said, we're going to build this software. If it doesn't work, we'll just build it again, because the cost of building this is so much lower than it used to be before. The cost really now is in the PRD generation. Just let's risk it. Let's shoot at it. We'll iterate and build it again and scratch it if we need to. That's learning number three: shrink the process, not just the code.

Then, finally, a couple of tips. He said, in every situation now, we want to reassess all previously made assumptions. There are so many assumptions we've made about a lot of our systems that just no longer are true with AI today. We just don't go back to re-evaluate them, which is where we miss out on a lot of productivity gains. That's tip number one. Then tip number two is, every company is dying to hire that German guy. If you're trying to look for a promo or move up in your career, try to be that German guy. Every business person is looking for it.

AI - Absolutely Imperfect

Then, finally, I want to talk about AI. Absolutely imperfect. It can have a lot of downsides as well, of course. I'm not trying to make it seem like this is all bad. You can have unintended changes, suboptimal design. It hallucinates very confidently a lot of times. Your own skills may start to erode as you start using AI more and more, which is another negative downside. There are security threats. Sometimes you have dependency risks, too, where a lot of people will write some code where they have no idea how it works, and now when you need to go fix it, you have issues coming up. Yes, definitely tradeoffs, as there is with anything in life or in software. Overall, the gain is going to be worth the tradeoff.

Takeaways

The takeaways I want to talk about is, one, don't just look at using AI to speed up your coding. Really look into those tasks beyond just the coding to see what you can speed up. Two, hopefully all of you that come out of this session will try an AI-powered IDE and try an AI-powered CLI tool. You might be shocked. I think a lot of the apprehensiveness against these AI tools also comes from somebody tried it out a year ago. It wasn't really there. You never really went back to try it. Now, in 12 months, they've made so much advancements that you might really be shocked, especially with something like Claude Code. Then, four, add some rules, add some skills, try them out, find some repetitive tasks, see how it works for your workflow. Then, five, continue to reassess any previously made assumptions you have in your workflows.

Questions and Answers

Participant 1: When you were talking about Claude versus Cursor, you had compared those two on the same prompt, but was it the same LLM model underneath or different fundamental model?

Sepehr Khosravi: It is the same LLM model underneath. There is additional fine-tuning that they do on Claude Code, which makes it go a lot deeper than it would typically go with just Claude on Cursor. You'll see that by the amount of tokens it uses as well.

Participant 2: Since you're working in Coinbase, I'm thinking it's a regulated industry. Which part of your business function do you use Claude Code or any of these tools to write code?

Sepehr Khosravi: In general, things like writing PRDs, any documentation writing, research planning, all of these things, I typically start with Claude Code or Cursor.

Participant 2: I was just asking from a coding standpoint. These are all from documentation and those stuff?

Sepehr Khosravi: At Coinbase, we are using Cursor and Claude Code. Yes.

Participant 3: You talked about AI native IDEs, Cursor, this stuff. How mature are those IDEs in terms of legacy features? I use JetBrains Rider a lot. They have a lot of refactoring features in there. They have a lot of nice things to analyze your code. Do those IDEs provide that as well? Are they mature enough, or are we just relying on the AI capabilities to do everything? Like if I want to refactor a class name, for instance, does it have a deterministic feature that goes in, refactors everything, or it's just going to ask the agent to do that, which sounds like a waste of energy, to ask an AI agent to do a renaming, which is pretty standard in the industry.

Sepehr Khosravi: I haven't seen the refactoring tools on Cursor. I'm not sure if they exist or not. IntelliJ also has their own IDEA AI model, which you can use. With refactoring, AI is really strong with it, at least for the cases that I've used it for. Like you said, maybe it's a waste of tokens that you're putting into it for doing those things. I'm not sure. I'd have to check the Cursor refactoring.

Robinson: I just want to add to, if you are using an IDE like Rider, you can use Claude Code inside of your terminal, inside of Rider, which is something that our Netflix developers do as well.

Sepehr Khosravi: I also have some QR codes up here if you want to connect with me on LinkedIn. I also have some marketplaces for Claude plugins and Cursor Rules that you can check out if you just want to see what other people are using and what's been working for them.

See more presentations with transcripts

Recorded at:

Apr 09, 2026

Sepehr Khosravi

InfoQ Software Architects' Newsletter