Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Slack Combines ASTs with Large Language Models to Automatically Convert 80% of 15,000 Unit Tests

Slack Combines ASTs with Large Language Models to Automatically Convert 80% of 15,000 Unit Tests

Slack's engineering team recently published how it used a large language model (LLM) to automatically convert 15,000 unit and integration tests from Enzyme to React Testing Library (RTL). By combining Abstract Syntax Tree (AST) transformations and AI-powered automation, Slack's innovative approach resulted in an 80% conversion success rate, significantly reducing the manual effort required and showcasing the potential of AI in streamlining complex development tasks.

This transition was prompted by Enzyme's lack of support for React 18, necessitating a significant shift to maintain compatibility with the latest React version. The conversion tool's adoption rate at Slack reached approximately 64%, saving considerable developer time of at least 22% of 10,000 hours. While this figure represents a significant saving, Sergii Gorbachov, senior software engineer at Slack, speculates that, in reality, the figure is probably much higher:

It's important to note that this 22% time saving represents only the documented cases where the test case passed. However, it's conceivable that some test cases were converted properly, yet issues such as setup or importing syntax may have caused the test file not to run at all, and time savings were not accounted for in those instances.

The team initially attempted to automate the conversion using Abstract Syntax Tree (AST) transformations, aiming for 100% accuracy. However, the complexity and variety of Enzyme methods led to a modest success rate of 45% in automatically converting code. One factor contributing to the low success rate is that correct conversion depends on contextual information regarding the rendered Document Object Model (DOM) under test, to which the AST conversion has no access.

The AST representation of `wrapper.find('selector');` (Source)

Next, the team attempted to perform the conversion using Anthropic's LLM, Claude 2.1. Despite efforts to refine prompts, the conversion success rates varied significantly between 40% and 60%. Gorbachov notes that "the outcomes ranged from remarkably effective conversions to disappointingly inadequate ones, depending largely on the complexity of the task."

Following the unsatisfactory results, the team decided to observe how human developers approached converting the unit tests. They noticed that human developers had access to a broad knowledge base on React, Enzyme and RTL, and they combined that knowledge with access to context on the rendered React element and the AST conversions provided by the initial version of the conversion tool.

Slack's engineers then adopted a hybrid approach, combining the AST transformations with LLM capabilities and mimicking human behaviour. By feeding the rendered React component under test and the conversions performed by the AST tool into the LLM as part of the prompt and creating a robust control mechanism for the AI, they achieved an 80% conversion success rate, demonstrating the complementary nature of these technologies.

The modified pipeline flowchart (Source)

Claude 2.1 is an LLM model announced in November 2023 by Anthropic. It included a 200K token context window, significant reductions in rates of model hallucination, and system prompts and allowed for the use of tools. Anthropic has since introduced the Claude 3 family models consisting of three distinct models, multimodal capabilities, and improved contextual understanding.

An Abstract Syntax Tree (AST) is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node in the tree denotes a construct occurring in the source code. A syntax tree focuses on the structure and content necessary for understanding the code's functionality. ASTs are commonly used in compilers and interpreters to parse and analyze code, enabling various transformations, optimizations, and translations during compilation.

About the Author

Rate this Article