InfoQ Homepage Presentations Directing a Swarm of Agents for Fun and Profit

Directing a Swarm of Agents for Fun and Profit

View Presentation

Speed:

Download

45:58

Summary

Adrian Cockcroft explains the transition from cloud-native to AI-native development. He shares his "director-level" approach to managing swarms of autonomous agents using tools like Cursor and Claude Flow. Discussing real-world experiments in BDD, MCP servers, and language porting, he discusses why the future of engineering lies in building platforms that orchestrate AI-driven development.

Bio

Adrian Cockcroft is a technologist and strategist with broad experience from the bits to the boardroom. He’s best known as the cloud architect for Netflix during their trailblazing migration to AWS and was a very early practitioner and advocate of DevOps, microservices, and chaos engineering, helping bring these concepts to the wider audience they have today.

About the conference

Software is changing the world. QCon San Francisco empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Adrian Cockcroft: The first Netflix talk was here in 2010. Back then, in 2010, did a talk here, and most people said, "You guys are crazy. What are you doing trying to run an enterprise on a cloud?" This is in the days when pretty much you got on the cloud with a credit card. Netflix was one of the first people that actually had an enterprise license with AWS. They had to create a sales team to give us a license and figure out how to do this so it weren't just buying instances on a credit card. That was 15 years ago? We've come along a little bit since then. I retired from Amazon in 2022. I'm now independent. I can do whatever I feel like. That's what retirement looks like to me. I'm still doing some advisory work, playing around.

One of the reasons that I didn't do much coding in the late years was because my RSI, my arms hurt too much. Too much keyboard time was a bad thing. I used to go around doing talks and things like that. Being a manager in meetings was less keyboard time. That was a physical thing, a reason why I pushed my career in that direction. Now with these nice AI tools, I just have to type something in and watch it for a while, while I'm trying to learn guitar or watching a YouTube video. Every now and again, I poke it and tell it to do something else. I've started doing a lot more coding recently.

I'm going to talk about the title of this. Let's look at why did I call it this. Why did I choose this title? I'm directing a team of agents. If you're a director, you don't read code. You have a team of people. You're trying to do something for your business. You don't watch everything they do. The way I'm using these coding agents is like a director-level manager. The agents build things I didn't ask for. Sometimes they're things I didn't know to ask for. I still have to keep nagging them to say, no, I really want you to run all the tests. No, I want 100% of the tests. Just over and over again, I'm nagging these things to try and get them to do the things. They behave pretty much like human developer teams. If you've ever been a director-level manager of developers, they do all these annoying things. It feels similar.

The main difference is that they do several days' work in 15 minutes. Then you have to figure out whether they actually built what you wanted. That's the directing part. The fun and profit part, I think, is all more subtle. I'm playing around because I can do that in public with my own code, with my pet projects, because the tooling is changing so fast that if you're not playing around, you're not keeping up. That's the fun part. I'm having a lot of fun coding with AI. I'm experimenting on my pet projects. I'm spending a few dollars. I think that's worth it to keep up. When you look at the for-profit side of this, code needs to be safe. Commercial deals take time to put into place. We've got to be a lot more careful and responsible for paid work. People look at what I'm saying, "We couldn't possibly do that. We're an enterprise". Yes, I know you couldn't do that.

This is my fun stuff I'm playing around with, that in five to seven years, you'll be doing that in an enterprise. You're not necessarily doing this stuff today. If you can learn how to go faster by playing around and experimenting in your own time and having fun building things that you've always wanted to build, but you couldn't persuade anyone else to build for you, that's the thing that I'm talking about here. Of course, the future is here, it's just unevenly distributed. There's way too much happening. It's going far too fast. It's very confusing. It's very complicated. Every month, there's a new, better way to do AI. I saw in the news, ChatGPT-5.1 Codex came out. Great. Maybe that's better than Claude. Who knows? Now we've got yet another new tool. We've got to go and try that out. That's a monthly occurrence.

I think you get an order of magnitude cost reduction maybe every quarter or so. The costs are going down, the quality is going up. If you're not experimenting continuously with these tools, you'll get stuck in some dead end, and everyone else will have disappeared off into the distance. It's very hard to buy this expertise from outside, because the consultants are running behind the curve themselves. I think this is something you have to do in-house wherever you are. Then the key thing here is to share what you've learned. I put most of my code up as just open source on GitHub for people to look at if they want to. I'll have links to a bunch of that as I go along.

The Early Days - Computing and Developer Demand

Going back to the early days, think about 2010, maybe even before that. If you needed a computer, you would go and say, I need a computer. The Ops people would say, I'll go find the sales guy to find out what we should buy. We'll get finance approval. We'll buy a computer. It'll arrive a few months later, and then you've got it for three years. I used to work in sales at Sun Microsystems back in the 1990s. That was how you got computing resources to run things. Nowadays, you make an API call, a machine appears, you use it for a few minutes or whatever you want, and then you get rid of it. When we were at Netflix, this is a bit controversial, we called this NoOps, because we stopped having to talk to the Ops people. I don't mean the Ops people disappeared. It's just they weren't in our path of what we needed to do.

As we grew the dev organization, we actually took over the AWS account responsibility, and the Ops people were doing that. Make sure Wi-Fi works in the buildings and ERP and HR tools and stuff, but the development of the product was completely owned by developer engineering, because we didn't have to talk to Ops every time we needed another terabyte of disk space or a machine, or something. We still needed people to run this code. What we have nowadays is a cloud platform service within your organization with a DevOps or SRE team.

All those people that were Ops people years ago have somehow become SREs or platform engineers or DevOps people or something like that. What they put together is policy, security, guardrails, incident management, all these things you need to run a platform. Right now, this is pretty much matured. It's slow changing. It's fairly mature. There's still things happening. Kubernetes is this big monster thing. We just had KubeCon, and I forget how many thousands of people were there, but it's pretty well understood. If you wanted a developer to run something, you say, I need a developer to write an app. I'll set up a meeting with the recruiters. We'll do a hiring spec, finance approval. A few months' time, the developer would appear. That's how you get developers.

Then once you've finished needing the developer, you have to do something with them, fire them, or put them on another project or something like that. Nowadays, if I want a developer, I just do this line of gibberish, which is how Claude Flow works, to say, write this app. A few minutes later, it spawns up five or eight developer agents, writes a bunch of code. Fifteen minutes later, I'm looking at the code or running it and seeing whether it works, seeing if it's what I wanted. I could call this no devs, maybe. Maybe that would make you people unhappy, if you're a developer. I don't need to hire developers to write code anymore. I still need developers, but I don't need to go and hire them. We still need people to write code. What I think you're doing now is you're going to need to build a development service, a platform that operates development. You're going to move your headcount from application development largely to platform development.

That platform's going to do policy, security, guardrails, pre-built components, all of those things. When a product manager wants to run an experiment or wants to build a feature, they'll actually use the platform. They'll spin up a bunch of developers to go build the next version of it or iterate on it, or whatever. That'll take an hour or so, something like that. However, this platform is not at all like the cloud platforms we've been doing recently. It's incredibly fast changing. It's incredibly chaotic, poorly understood.

The tooling and the best practice will change every month. It's probably a big pile of MCP servers or something, all randomly programmed to do something or other. I'm interested. This is a callout. If you're playing around and building these things. I want to talk about what you're doing in this space, because I think there's something going to be emerging in this area. I think it's a set of practices and patterns that we don't quite have yet or that are emerging. In the same way as cloud native development appeared after the cloud appeared, I think what we have is an AI native development process and organizational structure, where mostly your human engineers are working on trying to get their arms around this platform that's changing at a crazy rate. Then the AI agents are doing the bulk of the application development for the product side of things. That's what I think's going on.

Making it Work

If that's where we're going to end up, then we have to make this work. The rest of the talk I talk about what I've done and the things I've built and how I've made things work. This is the fun bit. This is the easy fast path code. If you want to just build a script that sucks some data out of a web page and mashes it into a CSV, this took me 15 minutes. Some of the prompts are in here. The code is public. I've got 600 lines of Python, and it works. It's just frictionless. It's so nice. This is easy. I would spend forever hacking away, trying to figure out, rummage around in the DOM to pull out the data and code all this stuff. I'm really not a Python programmer. I can read Python reasonably, but I can't write it. I don't know all of the patterns and how to do that. This is the fast part. This is the easy thing. This is where people are having a lot of success with, I just need a script. One of the companies I know of, the VP of HR, is writing scripts like this themselves. They want a little training class. They have Cursor. When they want to do some analytics on their databases, they're just sucking data up, writing their own scripts, doing their own analytics. No programming experience whatsoever. That is the easy bit, where you're just doing a bit of analytic stuff.

Here's some of the tips I have for things that work. Start with Python. It just works. It's the thing that you're most likely to get work. I have an allergic reaction to JavaScript and TypeScript, and maybe some people are more into that than I am. What I've seen is TypeScript builds and builds, and there's a hairball that's hard to understand, whereas Python tends to stay more structured, so you get a bit more maintenance. The Python code almost always just works first time. That's what I've seen. Other languages, like I did Swift, and it writes syntax errors and stuff like that, and it takes a while to get it to work. I've heard that Go maybe works reasonably well, but Python really works well. Porting between languages works well, and I've got some examples of that.

Then if you're doing that, and you've got a fairly large complicated system, then hopefully you've got tests. If you haven't got tests, tell it to write the tests. Tell it to build a bunch of tests, get the tests working, port the tests to the new language, or point your test framework at whatever you're building, and then build the thing. It's a two-step process. I was doing some analytics in R. I've known R for a long time, and I was trying to do this analysis code. I spent quite a bit of time figuring out exactly what the analysis was. Then I wanted to give it to somebody else to run. In the R environment, you have to install RStudio, and it's a big mess and whatever. I said, make a Python version of this. Five minutes later, I had the Python version, sent it to them. They could run it. It did all the same thing as the R version. This is super easy to do. This is the stuff that works really easily. It wasn't a particularly big piece of code, but that was a nice, easy, fun thing.

Another thing I've done, lots of people are figuring out they should do test-driven development. I go one step beyond that and use behavior-driven development. The BDD tests are actually a much nicer structure for driving the behavior. I've had a few people go, yes, I'm doing this too. This actually does work better. When you do a BDD, you structure your test as, given this setup, when this happens, then this should happen. When you structure that way, it gives the agent more structure. It's harder to fake the results. Everything basically comes into place. I've seen much better quality results. You can do it against the unit tests. It mocks all the backends and things like that to get your functionality right.

Then I say, do that again, but do it end-to-end, integration test with a live database at the back, the whole running system. I want the same BDD tests. Then it finds a whole bunch of other things that don't work. Again, you're now testing your whole system with behaviors. The behaviors become effectively the spec for your system. If you can build more and more of those behaviors to test the corner cases and get everything to work, then you could pretty much delete your whole codebase and start again once you have a good set of BDD specs for it.

The Python BDD scripts are really readable. You don't need an additional copy of the definition. You just read the code. It's very simple to understand. The other thing I found is the code, when it's reading the code, it will sometimes reintroduce bugs. It'll go around in loops. It'll do things over and over again. What I tell it to do is maintain a context block at the beginning of each piece of code. When it reads a source code file, the first thing it does when it reads the file is it finds 100 to 200 lines block comment that says everything that it knows about that code. It maintains it. It writes it in there.

You just say, put some context block in there. It lists what this does, the APIs, the version history, and tell it to maintain that every time it changes the code. Every time the random different agent or whatever, switch from Cursor to ChatGPT Codex or something, the context is there. It goes through. The first thing it does is it sees the context, and then it reads the code. It's already figured out a lot about that code before it gets into it. It's not trying to reverse engineer everything from just reading the code. Makes a huge difference. Again, you just make forward progress more quickly.

I had a chat to Kent Beck at one point. He said, you have to do this tidy first thing. I looked at him and I said, yes, that's definitely a good way of doing it. You do a little bit of coding. Maybe you spend like an hour generating some new code. Then you spend the rest of the day tidying it up, making sure it's got tests, cleaning it up, getting rid of all those doc files that it sprinkled all over the directories as it was writing it, making sure every error, every logging entry, all the warnings have gone away, all the deprecated random warning, any warnings and errors, just keep tidying it up. Then you've got your code. You built that. Maybe it took several hours building it after it originally got coded. You're still faster than if you'd written it yourself. You can ask for code reviews. There's a very interesting asymmetry here. It's much easier to criticize something than to build it. Everyone knows that.

The same thing works for the AI agents. They are really much better at telling whether something's any good than building it in the first place. You use the agents as code reviewers. They're like, this is stupid code. Who wrote that? It was you just now. I was working on Swift, and the view file got huge and huge and huge, it was throwing stuff in it. I told it, do a code review. It said, yes, I should put all that in the models and the controllers. It shrunk the size of this view file from 900 lines to 300 lines or something as a process of tidying it up. Just every now and again, just do that tidy up, because it's going to be much better at doing that, and archiving all the documents, keeping it clean. I had some times with this Python processing code. You get something that just takes a long time to run. You say, speed that up. It says, yes, there's a bunch of linear things. It goes and rewrites it a bit. Then it runs in a couple of seconds instead of 30 seconds. It just does that for you.

Agentic Coding Tools

Cursor is a pretty nice tool. I have a license for that. That's been my main coding tool for the data science-y stuff I do. This is mostly manipulating CSVs and doing a bunch of analysis, observability analysis. It's single-threaded, so you watch everything go by. You can see what it's doing. Now it's got multiple agents since last month. I haven't actually tried that particularly. It's more interactive. This is more like normal coding. When you're using Claude Code, it's going too fast and everything's moving around. You don't actually have time to watch what it's doing. There's too much stuff going on usually. I ended up with the $200 a month version of this, because it's really annoying when it runs out of tokens halfway through doing something and conks out. It's not that much.

If you think of how much developer time you get for $200, this is quite a cheap way of buying a month's worth of as many developers, pretty much, as you can kick off. The other trick, and this is something I got from just using the agents, is that you want to run Claude in dangerously skip permissions mode. For the Google one, it's called YOLO mode. Basically, you want to do that somewhere safe, because you just want it to run. You don't have to keep stopping and telling it you can do things. What I do is I run it in Codespaces. You have your GitHub repo where you're storing everything. I've got a little thing here. There's a little green button that says Code.

When you click on it, there's local, and there's Codespaces. You click over to Codespaces, and you hit plus, and it makes you a little thing on Azure with your repo in it, and a web browser, and the web interface to this Linux machine with your repo in it. You install Claude in that, and you tell it to do dangerous things. All it can do is commit to the repo. It can't do anything else. It's basically a safe place to do that. You get $20 a month for free. I'd have all my stuff in this. One month, when I was going absolutely crazy, I ended up using $25 and had to pay Microsoft $5 for the extra. Most months, I'm not even hitting $20 a month of the capacity.

That gets to Claude Flow. Reuven Cohen is actually one of the early cloud pioneers. I met him in 2009, I think. He's always one of those guys that's at the leading edge of technology. He's been playing around with this stuff for a while. Last June, I think, he released Claude Flow. This is an agent framework. It basically takes each copy of Claude, gives each one a different MCP server that personalizes it to be a coder or a developer or an architect or a researcher or a Hive queen who's a line manager or whatever, or a backend tester, a DevOps one. A DevOps one will just know how to go build everything into a Dockerfile and do the builds and all that stuff. They communicate via shared memory and to-do lists.

This piece of code, he's building this system using itself. It's really hard to keep up. It's adding features every week or every month and going crazy. He's spinning out projects. It's ridiculous. I think he currently has four accounts for the maxed-out Claude thing, and he's using them 100%. When you do this, you get significantly better behavior on large projects. Why is that? It's because they're working in parallel, so stuff happens even faster. Also, because each agent is single-minded, it's easier to focus, it's just doing one job.

Then they check on each other's work. You've got a coder who's writing the code. There's a tester that's writing the tests. There's another agent running the tests. The one running the test tells the tester, that test broke. Basically, they share all this knowledge, and then you get better tests, and then you get better code. Then the tester's one purpose is just to make sure the test passed. All that kind of thing means that you don't have the shared context pollution where one agent is trying to think of too many things at once and ends up faking stuff or just trying to find ways around or confusing itself.

LLM as Product Manager and UX Designer

The other thing you can do with LLMs is basically product manager-y stuff. What I do when I'm wandering around and I have an idea, I pull out my phone, and I have ChatGPT and Claude, just as the mobile app, not the Claude Code thing. I just type in, how would you do this? Or, here's an idea. I basically iterate on ideas on my phone just at random times. You think of something and just iterate on the idea. How would you do this? Then at the end of that, you say, write that out as a file somewhere. Then I can go to my computer later, my desktop, my laptop, pull up that file, save it, save it into a repo, and say, go build this thing. It's a super-effective way of just being very exploratory.

Then, as a UX designer, I was developing some user interface stuff. I went, build me a UX guide thing. I don't know what these are. I know some UX people, but I've never been one. It made up personas, onboarding flows, massive detail. It was actually very on point. I tweaked it a little bit where I wanted something specific that it didn't come up with. Then the other thing I got it to do was write a human testing guide that says, it's going to be a GitHub repo. When you visit the repo, there should be a guide saying, how do you do this? What does this thing do? I want to be able to run the test myself. I should be able to download this repo and work through some steps. What should I see when I do these steps? It makes it work through and generate those steps for you.

Then you go and run it yourself and say, no, it didn't quite work because you were in the wrong directory, and you debug it for it. The UX designer stuff I thought was quite powerful. I showed it to a real UX person. They said that was a plausible looking UX structure because they all follow the same structure. As long as the domain you're operating in makes sense to the LLM in general, then it will come up with plausible sounding things. What I was doing was house management. It was like, there's the person that's written the code. There's the main admin for the house. Then there's somebody that's just like a visitor to the house that doesn't know how anything works. Then there's somebody who's non-technical in the house who just wants stuff to work and doesn't have to think about it. Those are the personas, and it worked through that.

Then the other thing about directing swarms is you have to tell them what not to do. Like, I want detailed step-by-step plans, but don't build it yet because I want to check the plans before you build it. Things like that. The prompts I use are like that at the bottom there. Use BDD, update the plans, push the whole repo to GitHub when you're done, and then let it go off and run. If you let it just build something too big, you end up with a monolithic ball of mud kind of thing. This is why we did microservices in the first place, because the monoliths were getting too big and too hard to modify and too tangled up together.

If you have several people working on the same codebase, you'll be going so fast you'll stomp on each other's code too much. You really have to break it up into separate repos and have each person that's driving a team or each team of agents working in a separate repo with stable APIs and clean interfaces or whatever it is between them. You want independently deliverable single function services to make this stuff work. Most people right now are just tinkering on their own. I've done a couple of times where I've been working with somebody else on the same codebase, and we've basically had to partition off where we worked, because it's too hard to figure out what state everything is in between you.

The Hello World of LLMs is to Build an MCP Server

The Hello World program is, build me an MCP server for this. This is the simple, repeatable exercise that you can go play around with. If there's a piece of knowledge or something or a tool or whatever, tell it to build an MCP server for it. Whatever you've got that's some complicated thing, you can build an MCP server, then you attach that server to one of the agents, and it will figure out how to use whatever that thing is. This is tooling. The first one I did was this persona-based LLM development. It says, what's the best way to introduce chaos engineering into an organization? Actually, this is just an online service. If you go to https://soopra.ai/Cockcroft, you can type whatever you wanted.

Mostly, you can ask me questions about what we did at Netflix 15 years ago, which is mostly what I get when I go on podcasts. If I get a list pre-set of questions when I go on a podcast, I type them into Soopra, and then I've got my cheat sheet of all our things, because it remembers all the stuff I said, and I don't have to type it out all again because I've done so many podcasts and blog posts and whatever. That's persona as a service, which is an interesting thing. I also got all the data I wanted to put in that, and I said, could this be made into an MCP server? This was earlier this year, quite a long time ago. There's an MCP server there that has all of Adrian's content in it, and you can attach an LLM to it and ask your local LLM questions if you want. It's an alternative to using the Soopra service. Pretty old code. Then if anyone else has too much content, too many presentations, and wants to do this themselves, MeGPT is a generic framework for taking an author's content and turning it into an MCP server.

Consciousness as an Observability Model

Then I did this thing a while ago, thinking about consciousness as an observability model, because human observability basically comes through us being conscious. Because you can ask people, how are they? What are they thinking about? You can only get a response from somebody if they're conscious. If they're asleep, you don't get an answer, because you're unconscious when you're asleep. For me, consciousness is the thing that goes away when you go to sleep or get knocked out or whatever. When you're conscious, I can interact with you, and you become observable. Your internal state can be shared with me. There's this idea of if you want to make systems more observable, you could add a consciousness system to them that you could interrogate to find out what's going on in them, and whether they're happy and whatever.

I started building a consciousness system for my house, because my house is far too complicated and has too many IoT devices. Nobody can figure out which one's which. Sometimes I can't even remember which app does what. I figured I wanted a knowledge graph of everything in my house shared as an MCP server so that I could talk to it via an LLM. I was chatting to Reuven Cohen about this. He said, I'll show you how to use Claude Code to do this. I wrote this up as a blog post last June. I built 150,000 lines of Python that day. It ran. It didn't actually do quite what I wanted it to do, because I tried to build far too much in one go and it just let the thing go crazy. The side effect of this is I'm now getting job offers on LinkedIn as one of the top 100 Python programmers on GitHub, because it's all open source and sitting there. It's like, I'm not a Python programmer at all but I have checked a lot of Python into GitHub recently as open source.

The code is sitting there if you want to go look at it. It's fairly messy. I abandoned that. Second attempt, I decided to do something different. I used to be an iOS developer. That was Objective-C. It was about 10, 15 years ago. I knew roughly how to get iPhone apps to go. I still had a developer account. I built a native Swift app. It runs. I got it working. It has weather, HomeKit, text-to-speech, voice recognition, all written in Swift as a native app. That was quite cool. Now I'm playing around with that. This is the app that I want to have figuring out what my house is doing. I needed to build a backend for that.

Then I decided I wanted to build something that would be as high quality as I could build it. Not just a throwaway prototype, but something that would be structured to be sensible. This is a distributed portable MCP service. What I've got is a knowledge graph. The naming scheme for this is based on the fact that Python is named after Monty Python. Nothing to do with snakes. British TV shows from the 1970s. Those people that know about British TV shows, there's a show called The Goodies. The Goodies released a bunch of singles with stupid names. That's what my projects are named. I have a Funky Gibbon, Inbetweenies.

That's where the naming scheme comes from. PowerPoint managed to mangle the fonts again. It's supposed to say Port, not Por, and then the t on the next side. What I have is a Python server, which actually has a complete knowledge graph in it with all the entities that you might find in a house, including some blobs, like pictures of something in the house and PDF manuals of things that are in the house. The idea here is something like, I have a thermostat on the wall. I take a photo of the thermostat. I find the manual for the thermostat. I upload that in. Now that thermostat controls the temperature, but which rooms? I can actually put a knowledge graph. These four rooms are controlled by that one thermostat.

Then that's a piece of information that's stored in this knowledge graph. For all of the random things. There's an app called something like Comfort or random strange thing that has nothing to do with thermostats. I have an app called Hayward Omni something. You know what that does? Anyone here know what a Hayward Omni does? It's a swimming pool controller. It's like, how are you supposed to know that? The icon doesn't even look like a swimming pool or anything. You can see it, that little H thing on the bottom right. The idea is you link all these things together, and you have that knowledge graph. I want that knowledge graph in a server in my house, but then I also want it on my phone. I want it on my phone when the Wi-Fi is down, because one of the things I want to be able to do is record on my phone how to get the Wi-Fi working again. It has to work in a disconnected mode, and I want the entire knowledge graph stored on my phone locally. That seems like a reasonable thing to do.

Then I wanted this client to be on my phone, which means I now have two copies of the knowledge graph and I need to synchronize them. What I built was this server, a protocol, and a client that lazily synchronize knowledge graphs. It decided, without me even telling it, to use vector clocks and a whole bunch of stuff. There's a last write wins and a bunch of stuff. It's tested, and it's got 225 tests. It's as tested as I could make it. It's as tidy as I could make it. Then I have authentication. I said, do a security audit. It said, we will do an OWASP Top 10 audit.

Apparently, I'd vaguely heard of OWASP. No idea what a Top 10 audit was. It said, you need better logging. We'll add that then. It' OWASP Top 10 compliant, it's got audit logging, rate limiting, authentication. I decided I wanted this thing to be as enterprise quality level as I could make it. I also wanted a guest mode read-only access in the protocol, where a QR code will give you a token that gets you access to it. That means a guest in the house could have a read-only version of how to operate this house. Think about this as something you could extend to another thing. The house is just an example. Think of some large, complicated system that you operate. Would you want to have a knowledge graph about that that you could edit on your phone to add information to and have all this ripple back and forth and synchronize? That was what I wanted to build. I built it all in Python, because that works.

Then I ported the protocol to Swift. I ported the client to Swift. I'm in the middle of building the app onto that client. I had still some work to do on that. This is all open source, Apache licensed on GitHub, if you want to go read the code. You can see some of my IoT apps that I have, random things. Does anyone know what Flair does? Anyone got Flair? You've got these air vents around your house. It has a thermostat that controls the air vent and controls and opens and closes it. Individual rooms have thermostats that control. It's called Flair. How do you know that? This is the kind of thing that I find annoying. I wanted to have a knowledge graph to hook everything up. Ting. Anyone here got Ting? Ting is a thing your insurance company will give you to monitor your power to make sure you don't have electrical fires. You see why we need this.

Demo Agentic Coding at Nubank

One of the reasons I haven't finished this is that I do some advisory work for Nubank. I've actually been spending a bunch of time with them. I was in Brazil and I went to do a demo about all this wonderful agentic programming I was doing. I decided, I want to do a demo. Why don't I make an MCP knowledge graph about Brazilian football? I said to Claude, if I wanted to do that, can I get data? It said, blah, blah, blah, sure. This is just for a demo. I want to connect this MCP server to Claude. I need some example questions. This is literally about how long it took. Save this document. I'll put this up in a repo. I'll build this as a demo. That was about how long that took. These are the kind of questions you get, without any other prompting. If there's any Brazilians, they might recognize some of these names and players and things. For the audience I did this for, this was like, yes, these make sense. All the backgrounds in this, I used Canva. I decided I like Canva, the Canva magic background thing. That's how I got all the backgrounds, because my slides look terrible otherwise. Used AI to make my slides look vaguely ok. I created a repo. It's public. You can go read it. Started a Codespace, installed the various things.

Then, for my purposes of my presentation, just basically that morning, just hit go. This is the command I gave it. The thing that I generated by chatting to Claude was the guide document. Implement the phases. Test it using BDD. Put block comments in it. Push it all to GitHub. About an hour later, it was done. It was actually a bit slower than I thought it should be. It was running one agent most of the time. I thought it could have gone a bit faster. You can go see it. Then I tinkered with it a bit to figure out what it had done and fix a few things. It did ok. This was part of the to-do list as it was doing it. Yes, initialize it, figure out the guide, set it up, figure out how to connect to Neo4j. Neo4j has a fairly weird query syntax, which I can make sense of. I don't want to have to write. It knows how to write those things. It built all this stuff. Yes, did a reasonable job.

Then I decided I could use this as a benchmark. The first try was September 30th. I took just the starting things, that file and the Neo4j setup, and I did it again this time using a different option in Claude Flow. You can just do a swarm, or you can do a Hive mind, which is supposed to be better and more coordinated. It puts a queen in there. It's like a line manager. It uses more memory. Sometimes it runs out of memory when you're running it on a default Codespace. If it keeps crashing on you, get started again in a Codespace with more memory, just as a quick hint. It seems to work better. This completed more quickly, looked better, seemed plausible. I'm just going to keep running this over time as my benchmark for what does it look like. It's a bit like, I don't know if you've seen the guy that does the pelicans on bicycles thing, you know what I'm talking about? Every time a new LLM comes out, he says, draw me a picture of a pelican riding a bicycle, and they just look hilarious. Go search for it.

Re-coding a Big Project from TypeScript to Python

Then I got diverted into another thing. There was a startup I'm working with. This is all offline stuff. It's still stealth, so I can't tell you what it was. Their idea was that they wanted to build a demo of their app so when they went for funding, they'd have it running. I said, that's about a week or so's work for somebody that's full-time. Chris Fregly, he's just finished writing this book. I was sitting there at dinner with these guys, chatting with them, and saying, I wonder what Chris is doing? I text him, and he says, I just finished writing the book, and I'm not doing anything for the next couple of weeks. I said, would you fancy playing around with this idea? He went, sure. He hacked away, built this thing. It's a pretty sophisticated app. This is a 1,000-page book on how AI works. He knows what he's doing much better than I do. This book has just been signed off. I did a praise quote for it, and I read about half of it before I just ran out of time reading a 1,000-page book. It takes too long. He built it, but he built it mostly in TypeScript, and now he's done with it, and he's off doing something else. Ok, I don't want to have to deal with this thing in TypeScript. I just said, translate the whole thing into Python. This is about 150,000 lines of JavaScript and TypeScript.

I said, I just want it all in Python, please, and it's currently in the middle. It's probably about 80%, 90% of the way through. It's converted all the tests, got most of them running. It's just crunching through, turning the whole thing into Python for me, and then eventually I'll just say, make sure it runs and get rid of all that TypeScript stuff that I don't like. Polyglot, maybe? Like, just do whatever you want. I was trying to get it going. This is a reason why enterprises are going to have trouble doing this kind of coding. This laptop is a Nubank laptop, and it's very locked down, and it goes to sleep after three minutes, and it locks screens, and it goes to sleep, and you can't turn off the going to sleep thing unless you watch a YouTube video on it. Last night I left it watching a three-hour-long YouTube video while Claude was working away and just put it somewhere I couldn't see it with the screen brightness turned down. Stupid hacker things where you're just like, the agents just have to be free to run when you're not looking at them, and I don't have somewhere else to run it at this point.

What's Missing?

What's missing right now? I'm doing really too many repetitive, mindless management tasks. It's like the director says, make all 100% of the tests pass. It keeps saying, I'll make 90% of the tests pass. No, I want 100%. How many times have I told you I want 100%? What's the point of only having 90% of the tests passing? Tidy it up. Yes, I do want you to keep going until you finish. I tell it to archive the old docs. I skipped the bit that didn't work. I don't want you to skip that bit that didn't work. Keep reminding it to push changes to GitHub, and then go compare it to the TypeScript code again. Yes, there's some more tests there I didn't do.

Whatever. It's basically, I need a director agent, and so I may have a go at Claude Flow and see if I can come up with a director-level agent that just nags everything else into position. I think we need this systemic platform for managing agent development. This is going to emerge organically probably in the next few months, because everything tends to take a few months. Basically, this breaking down tasks, managing to guardrails and policies, and then automating the need for development management away. Maybe we have no man. How about that? Got NoOps, no dev, no man. We don't need line managers anymore if we automate them all away. Something like that.

Developer Carbon Footprint

There's a lot of fuss about there's all this energy being put into data centers. How much energy am I using to write this code if I hired human engineers to write it? What is that comparison? I just had this long conversation with ChatGPT, and I've done a lot of work on sustainability and things, and I know the numbers, and it was plausible what it was doing. A human developer in the U.S., your carbon footprint is about 20 tons a year, depending on how many international flights you take, pretty much. That's about 10 kilograms of carbon dioxide per hour. On a good day, 1,000 tokens an hour is pretty good for a human developer. There were some vague arm-waving estimates to get to that. That's lines of code you could generate and whatever. A million tokens is about what Claude does in an hour, roughly, if you arm-wave a bit.

Even if you just say it's 10 times less efficient, it's still 5,000 times better than using Claude from carbon footprint point of view. It's running at 100 times the speed, or it's really running at 1,000 times the speed, but most of the tokens are being wasted because it's rummaging around rereading the same things. Then you probably do that too. I'm not quite sure how efficiently human tokens work. Still, you can go and play with the arguments here any way you want. I think it's pretty clear that developing code with AI uses less carbon than developing it with humans. That was what I was trying to get to. The numbers here are so far apart that I don't care whether it's 100 times or 1,000 times or a million times better. It is still a lower carbon way of developing. That's just developing. Running the code is a separate thing. That developer thing. You're going to be replaced by a small shell script or something, whatever that old thing was.

References

Some references here to the various things. If you find my GitHub account, I have a /slides thing. I'll put them up on there. It's adrianco/slides on GitHub. Then you can click through these various links.

See more presentations with transcripts

Recorded at:

Apr 02, 2026

Adrian Cockcroft

InfoQ Software Architects' Newsletter