BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News QCon London 2026: Rewriting All of Spotify's Code Base, All the Time

QCon London 2026: Rewriting All of Spotify's Code Base, All the Time

Listen to this article -  0:00

At QCon London 2026, Jo Kelly-Fenton and Aleksandar Mitic from Spotify presented how the company is using an internal AI-powered coding agent called Honk to perform continuous, large-scale code migrations across its entire codebase, achieving 1,000 merged pull requests every 10 days.

The presentation, titled "Rewriting All of Spotify's Code Base, All the Time," explored the evolution from Spotify's existing Fleet Management system to an LLM-driven approach that addresses the long tail of complex migrations that deterministic scripts could not resolve. The speakers noted that developers spend little time actually writing code, citing research suggesting engineers average just 52 minutes of coding per day, with the remainder consumed by meetings and maintenance tasks.

Spotify's Fleet Management philosophy places responsibility on library owners to migrate all consumers to the latest version. Before Honk, automated scripts could transform code and create pull requests across thousands of repositories, reducing migration timelines from nearly a year to under a week for 70% of the fleet. However, the remaining 30% proved extremely difficult due to edge cases and complexity, leaving incomplete migrations that increased codebase diversity.

 

Honk was born from the idea of replacing these deterministic scripts with LLMs that could better handle edge cases. The team quickly realised they needed to package the entire software development process, including requirements, code generation, building, testing, and iteration.

Early challenges revealed that agents would take shortcuts to make builds pass, such as commenting out failing tests or downgrading Java versions. The team initially implemented an "LLM as judge" to evaluate whether generated code addressed the original requirements, but found it too rigid, blocking valid changes. As models improved, the judge was eventually removed, with verification steps in prompts proving sufficient.

Scaling to hundreds of repositories introduced infrastructure challenges, including missing credentials, Docker requirements, and the inability to run iOS builds on Linux machines. A critical architectural decision was to separate the agent runtime from the verification runtime. Honk now pushes branches to GitHub, triggers builds via a verification service that abstracts CI systems, waits for results, and only creates pull requests after full validation.

A hack week integration with Slack proved transformative. Developers wanted to act on work directly from where it was discussed, initiating code changes from Slack threads containing dashboards, logs, and Jira links. This evolved into a "code from anywhere" approach, with an exposed API enabling integrations from any surface.

The scale of output has grown significantly. Six months ago, Honk achieved 1,000 merged pull requests in three months. Today, that same volume is reached in just 10 days. The speakers noted this shift has made PR review, not code generation, the new bottleneck, drawing a parallel to aviation where pilots monitoring automated systems perform the hardest job.

To address the review challenge, the team outlined three strategies. First, a culture shift around review expectations, including allowing migration drivers to approve their own PRs and closing stale pull requests. Second, tooling improvements such as a PR inbox that helps prioritise reviews and potential auto-merging for documentation changes. Third, and most significantly, codebase standardisation.

The speakers argued that a diverse codebase produces a diverse set of problems, leading to complex prompts filled with conditional logic. Their standardisation strategy involves advisory boards making technology decisions, using Honk to drive existing migrations to 100% completion, and enforcing standards through monorepos with linting. This creates what the speakers described as a cycle: standardisation leads to more correct agent code, which enables easier review, which increases code capacity, which drives further standardisation.

Spotify has published a three-part engineering blog series detailing Honk's development, covering the agent's journey.

About the Author

Rate this Article

Adoption
Style

BT