Key Takeaways
- Software testing is a data processing effort similar to how AI is being used
- We built a QA bot, BAIT- Bot for AI Testing, powered by AI, that supports testers with boring, difficult, tedious testing tasks
- BAIT tests render issues such as missing textures, missing text strings, overlapping graphical elements, and rendering artifacts very well
- Building bots powered by AI requires organizations to introduce machine learning developer as a new role
- Bots empower testing organisations to have an army of virtual testers, scaling testing efforts to a new level
King's successes include Candy Crush Saga and the sister game Candy Crush Soda Saga, the main factor being the creative and innovative culture which fosters engaged and motivated people to create fun games that delight our users using cutting edge solutions.
An environment with lots of freedom and trust – with space for experiments, exploration, and learning – makes people happy. Happy people are more likely to be open to new ideas, more cooperative, and more creative. This is crucial for a company like King.
To be able to improve features in games which are constantly evolving, the challenge will be to scale tests to be on a par with new feature development. Automated tests are vital for us to keep up, therefore we are constantly looking for new improved ways to test.
What made us decide to use AI in testing and what problems were we hoping to solve
King is a data-driven company; all business decisions are based on the insights of data collected from our player’s performance. With the rise of AI in recent years, there are opportunities for using that data in testing. Our finding is that there is a correlation with testing and AI, where both use inputs (training data/test cases) consumed by models (AI models/testing heuristics) and generate an output (predictions/test results).
To make this effort appealing to testers, we focused on developing a QA bot called BAIT- Bot for AI Testing, powered by AI, that would support testers with boring, tedious testing tasks and also cover areas untested due to a lack of resources and test areas which were previously seen as impractical or impossible to test manually.
How we use bots in testing
We started in one corner of testing; a low hanging fruit has been to test localization where we use BAIT to test +20 languages. BAIT is designed to traverse through our games by taking a screenshot of a new screen, and with a trained AI model recognize all elements such as buttons, text strings and relevant game icons.
Example of detected (green rectangles) elements by elements classification AI model
After that, it creates a list of all detected elements and clicks on each of them. BAIT uses a similarity algorithm to see if we have reached a new screen after a click, and if that's the case, a graph is updated with a new node representing the new identified state. BAIT will click on all identified elements in all the new screens in a random way until there are no more new screens, making it ideal for exploratory testing detecting functionality, crashes and performance issues.
A graph generated after BAIT traversed through a couple of levels in Candy Crush Saga game
A graph of BAIT traversing +4500 levels in Candy Crush Saga game
Another area we have seen that BAIT can help out with is testing rendering issues such as missing textures, missing text strings, overlapping graphical elements, and rendering artifacts due to different resolutions/orientations of mobile handsets.
An example of a missing texture (pink rectangle) detected by similar-contour area algorithm and a missing text string (string id in brackets) detected using Google vision API
What challenges we encountered and how we have addressed them
I would say to get started with AI-powered testing we had multiple challenges. First, we did not have the right skill set available to develop our bot, so we recruited two machine learning students to build a proof of concept of a vague idea that a QA bot could help out with testing. I think that for the first couple of months, the students were unsure that what they were building would work, because it was an untested concept.
The second challenge we discovered was that it’s important to involve people who have the determination to make it work. With AI projects it’s easy to take on too big of a problem, delaying the time it takes to see actual results in order to determine if the problem was solved. To avoid this, we applied a mini agile cycle approach, were we iterated with small achievable goals over 2-3-day iterations so we could evaluate if we were making progress. If an algorithm did not show promising results after one or two days, we tried another one. Here it’s important to be critical; we wanted to use algorithms able to solve problems in general in order to remove manual supervision. After 2-3 months, we had a prototype that was able to traverse through a game the way we wanted.
The third challenge was to pick a problem that BAIT could solve to prove it could bring added value to our testing. After discussing it with our game teams, one problem that they wanted help with- which is tricky to test manually or to automate in a traditional way- was to detect missing textures. We realized that visual artifacts were ideal for BAIT to detect; there are algorithms suited for this task that we implemented as a test capability and also, we have been using Google vision API to detect text strings, helping out with localization testing.
The AI/ML algorithms we tried and which ones worked for us
Initially, we would use a blob detection algorithm, Maximally Stable Extremal Regions (MSER), to detect UI elements and interactive regions (e.g. clickable). MSER was not very reliable; it showed different behaviors across different devices with different colors and resolutions, and was very hard to fine-tune.
So, we decided to train a neural network model to detect and categorize the desired UI elements (e.g. buttons) in a game scene. We used transfer learning and SSD networks. The training data consist of a collection of game scene images from multiple mobile games (non-King games) which we manually collected and labeled (by specifying a bounding box around the interactive areas).
A challenge that we faced was detecting buttons rendered in a scene covering different backgrounds. A close (X) button covering popup dialog and the main scene at the same time was difficult to detect with high accuracy compared to an Ok button just covering a single color background. To address this issue, first we trained close buttons separately adding more training data to try to make detection algorithm stronger finding buttons rendered over multiple backgrounds.
Button with simple background Close Button with complex background
But we still had problems with cases where close buttons not being detected due to a wide variety of dynamic backgrounds and realized we had too little training data to tackle this problem with this approach. As a complement to further improve the accuracy of detected regions, we trained a masked R-CNN model in which we defined the interactive areas with a mask instead of a bounding box and that helped to resolve previous detection problems as algorithm detected buttons on a simpler background.
A masked R-CNN model used to define a mask instead of a bounding box for buttons with multiple backgrounds
We trained another neural network model for the classification of game scenes into different types (e.g. road map scene, level scene and etc.) so we could perform suitable actions in each game scene (e.g. win a level, scroll through the road map and etc.). For this we use a pre-trained model MobileNet V2, and added a few more layers for fine-tuning.
Technical details on different technologies we used in our bot framework
In order to compare images and avoid duplicate nodes in the graph that the bot produces, we use a combination of visual similarity and text similarity. We used a mixed algorithm because of the existence of dynamic scenes with animations, so text similarity was combined to have a better judgment of similarity between game scenes.
We use the Google vision API to extract text from the images and used opencv for visual similarity between images.
We currently do not use a NoSQL graph, but instead we save the graph information in json format which is accessible through a REST API. This is something that we will definitely change and improve on in the future.
Tensorflow and Keras were used in the implementation of neural networks.
Opencv was used in some image manipulation steps to identify buggy scenes.
Benefits we gained from testing Candy Crush using AI
Multiple benefits have been discovered that were not clear to us when we started. To be able to test visual issues is a big win, as this can free up resources in order to focus on testing in other areas. Due to the nature of running a bot, we can scale up our testing efforts by running multiple bots covering different areas in parallel, helping us to increase test coverage.
With traditional automation, it’s been tricky to test areas such as audio, code coverage, and rendering issues, and now we can cover those areas more easily.
What is wrong with this screen?
BAIT can easily detect that particle effect to the right rendered in front of transparent background layer instead of behind it
What’s next for BAIT
We are starting to look at questions such as, "If we run one instance of BAIT and it finds one crash after running for a couple of hours, what will happen if 1000 BAIT instances are running in the same amount of time? Will we find more crashes?"
Having a QA bot powered by AI enables us to have these types of test strategy discussions and try-outs. My gut feeling is it will benefit us in the future, and I think it’s a game-changer for QA, taking testing to a new level with this cutting edge technology approach.
Conclusion
I think bots will replace all the predictive testing that exists today. Important questions that we will continue to struggle with and where humans will continue to play an important part in are, does this look good? Is this fun? Is this healthy? I think a future QA role will require a soft skill set, and will act more as a testing coach for intelligent testing bots.
We would like to contribute to a global QA AI community that will benefit us all. I am happy to help out if anyone has questions about how to get started with AI testing or in general about BAIT. Feel free to contact me via LinkedIn.
About the Author
Alexander Andelkovic is working as senior agile testing lead at Sweden-based King, developer of the popular mobile game Candy Crush Saga. Andelkovic has worked on multiple complex test projects, ranging from using session-based test management for quality assuring MED-Tech devices for life critical systems, to establishing a world class approval process for Spotify apps used by Fortune 500 companies. Now he teams up with developers in testing big data, business analytics, and game level regression testing using AI. Andelkovic performs both system testing and exploratory testing, with a focus on assisting teams with high-quality deliveries.