HubSpot’s Sidekick: Multi-Model AI Code Review with 90% Faster Feedback and 80% Engineer Approval

HubSpot engineers introduced Sidekick, an internal AI powered code review agent designed to analyze pull request changes and provide automated feedback to developers. The system uses large language models to review code and post comments directly in repositories on GitHub. According to the engineering team, the tool reduced time-to-first-feedback on pull requests by approximately 90 percent while helping developers identify issues earlier in the review process.

Code review is essential in software development, but can be delayed when reviewers are unavailable. At HubSpot, engineers found that AI coding assistants sped up code creation, while manual reviews lagged. Sidekick provides immediate pull request feedback, letting human reviewers focus on architecture and higher-level design, improving efficiency and reducing review bottlenecks.

As Emily Adams explained in a company blog post,

What we found might surprise you: our AI code reviewer catches real issues, understands HubSpot‑specific context, and maintains a high signal to noise ratio, often leaving no comments at all.

The first version of the system ran on an internal platform called Crucible. Large language model agents operated in Kubernetes environments and interacted with GitHub repositories via the command line. The agents retrieved pull request changes and generated review comments using prompts to identify potential issues or improvements. While this approach demonstrated that LLMs could provide useful feedback, it introduced operational complexity. Each review required separate containerized workloads, increasing latency and infrastructure overhead, and limited control over agent interactions with developer tooling and internal services.

To address these limitations, the engineering team migrated the system to a Java based agent framework called Aviator. It integrates with HubSpot’s development platform, letting review agents run within existing services rather than isolated workloads. Aviator supports multiple model providers, including Anthropic, OpenAI, and Google, enabling experimentation and fallback options. Through RPC-based tool abstractions, agents retrieve repository context such as configuration settings and coding conventions, improving the relevance and accuracy of automated review comments.

A key challenge identified during deployment was feedback quality. Early versions produced verbose or overly positive comments considered noise. To address this, the team introduced a " judge agent," which evaluates comments before posting them to pull request discussions. According to HubSpot engineers, this evaluator pattern reduced low-value comments and improved the signal-to-noise ratio. Developers can also react to automated comments, providing feedback that guides prompt adjustments and model selection. The system has recorded a consistent 80% thumbs-up rate from engineers, demonstrating strong adoption and trust.

Review Agent to Judge Agent evaluation loop (Source: HubSpot Blog Post)

Brian L, VP of Engineering at HubSpot, noted on LinkedIn:

The most impactful change was adding a second agent to evaluate reviews before posting. The result: fewer, better, and more actionable comments. We knew we’d gotten it right when engineers started asking to see Sidekick’s feedback even before opening a PR.

HubSpot engineers mention that future work includes adding persistent memory for review agents and expanding context retrieval across repositories to improve understanding of related code changes.

About the Author

Leela Kumili

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Leela Kumili

Rate this Article

This content is in the Agents topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter