Meta has applied large language models to mutation testing to improve compliance coverage across its software systems. The approach integrates LLM-generated mutants and tests into Meta's Automated Compliance Hardening system (ACH), addressing scalability and accuracy limits of traditional mutation testing. The system is intended to keep products and services safe while meeting compliance obligations at scale, helping teams satisfy global regulatory requirements more efficiently.
Mutation testing evaluates the effectiveness of test suites by introducing small, deliberate changes mutants into code and checking whether tests detect them. Traditional mutation testing has seen limited adoption due to excessive mutant counts, high computational costs, and the presence of equivalent mutants that add little value. Meta’s approach uses LLMs to generate context-aware mutants and corresponding tests, reducing noise and focusing engineering effort on high-value code paths.
Before LLM guidance, mutation testing relied on static, rule-based operators. These generated large volumes of mutants indiscriminately, many semantically equivalent to the original code, overwhelming test infrastructure and developer workflows.
Meta’s ACH system uses LLMs to generate realistic mutants and targeted tests, focusing on privacy, safety, and regulatory concerns. An LLM-based equivalence detector filters redundant mutants, while a test generator produces unit tests that engineers can review rather than write manually, significantly reducing operational overhead. Early deployment across Facebook, Instagram, WhatsApp, and Meta’s wearables platforms produced tens of thousands of mutants and hundreds of actionable tests.

Architecture overview of the ACH system (Source: Meta Tech Blog)
Since incorporating research findings into ACH, Meta presented the work at FSE 2025 and EuroSTAR 2025, demonstrating how LLMs overcome barriers that previously limited mutation testing at scale. Traditionally used to assess test quality, mutation testing now leverages generative AI to produce tests more efficiently, making the technique more practical and scalable.
As emphasized by the Meta Engineering Team:
From October to December 2024, we ran a trial deploying ACH for privacy testing across Facebook, Instagram, WhatsApp, and Meta's wearables platforms. Out of thousands of mutants and hundreds of generated tests, privacy engineers accepted 73% of the tests, with 36% judged as privacy relevant.
Building on ACH, Meta introduced the Catching Just-in-Time Test (JiTTest) Challenge to explore LLMs in automated software testing. The system generates hardening tests to prevent regressions and catching tests to detect faults in new or changed code. Tests are produced for review just before pull requests reach production, addressing the Test Oracle Problem while retaining human oversight. Meta's paper, "Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges," presented at FSE 2025, details the JiTTest Challenge and related open research problems.
Meta says LLMs have helped streamline and optimize compliance and risk management frameworks by transforming time-consuming, error-prone processes into more efficient systems. Ongoing work includes expanding ACH beyond privacy testing and Kotlin to additional domains and languages, improving mutant generation through fine-tuning and prompt engineering, and addressing the Test Oracle Problem. Meta is also studying how developers interact with LLM-generated tests to enhance adoption and usability. Further results will be presented at upcoming conferences, including Product@Scale.