Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Automating Visual Testing with Appraise

Automating Visual Testing with Appraise

This item in japanese


Developing applications where the look and feel is key for success might benefit from automated visual testing. Appraise, an open source tool on GitHub licensed under MIT, applies the approach of specification by example to visuals. It helps teams to review and approve changes to web pages using visual inspection.

Gojko Adzic, the creator of Appraise, gave the opening keynote "Painless Visual Testing" at the European Testing Conference 2018. InfoQ interviewed Gojko about what makes visual testing hard, and asked him about Appraise.

InfoQ: What are the problems that testers often have when testing visuals?

Gojko Adzic: Historically, testing visual aspects of an application required a human eye, and human judgement. This makes visual testing expensive and slow, so it’s very important to have good heuristics to work out the correct coverage.

With visuals, it’s often difficult to express correctness. Someone can design lots of specific checks for, say, element size, positioning and styling, but the whole composition on the screen might still be wrong, weird, or just ugly. Though people can easily spot when something is off, because it’s difficult to express expected results, automation traditionally wasn’t very helpful.

Even when the expected results are clearly defined and deterministic, testing the visuals often requires building and deploying the whole stack of an application, end to end, making such tests slow and brittle.

InfoQ: How do testers deal with these problems?

Adzic: A common solution is to automate as much as possible below the user interface, letting people focus manual efforts on the remaining part that requires a human eye. Because user-interface automation tends to be slow and brittle, software teams tend to decouple functionality from look and feel and push testing as much as possible below the visual interface. This is where the famous test automation pyramid, first presented by Mike Cohn, has been really helpful to communicate what’s taken as a best practice in software test automation today.

This works well for business applications or enterprise software, where the user interface isn’t really the unique differentiator. But teams working on applications where the look and feel is key for success still have an enormous task in front of them, and not a lot in terms of good techniques or practices to deal with it. Technically, there are plenty of tools for checking the visual aspects of an application, but taking humans out of the equation still isn’t possible.

While working on MindMup, we ended up really struggling with this problem. Because it’s a consumer application, and a large portion of the user-base are school-age children, visuals and look and feel are critically important for the overall experience. We’re a tiny team, building, supporting and maintaining the application. And as the code base grows, the time required to test the visuals was really taking a toll, slowing us down from working on new features. We have a ton of automated tests, and even had tests for most of the visual aspects of the app automated using in-browser tests that check DOM elements, but those tests were very difficult to maintain.

For example, changing some common aspects of mind maps visuals, such as fonts or margins, would cause hundreds of tests to break. Each of those test cases would be deterministic, and given a few hours and a calculator I could easily work out the new expected state, but I’d rather spend my time improving the product than fixing test cases. We could, of course, abstract those things away and design smaller unit tests that do not depend on real visual look and feel styling, but that approach provides very little confidence that things won’t be weird or ugly, so we’d still have to review the final result visually, and that took a lot of time.

In fact, because the other tests were so brittle, we mostly ended up having a dozen or so mind maps that would demonstrate all the key look and feel aspects, and then load them up manually to verify that things looked OK. If they did, we’d more or less just copy the actual results from the failing code-oriented tests and declare those the new expected values.

This got us thinking about how we’d be able to speed up that process, and perhaps design an automation pipeline that was there to assist us in doing the visual review, so we could do it much faster and more effectively. We figured that there’s not much point designing automated tests aimed at replacing humans from this effort, but if we could design some automation to assist us in doing it, that would be great.

Humans are great at evaluating if something looks OK or not, but we don’t really need to be involved in the process of collecting all that information, so we started automating everything apart from the final decision. We built a tool that could perform the grunt-work of setting up the application to try out lots of different visual test cases, and then just present it to us for approval. Once we had that, we could review many more cases. Instead of a dozen or so maps that would help us check for lots of things at the same time, we were able to design tests around individual aspects -- kind of visual unit tests -- and let the automation piece just present the differences.

InfoQ: What is Appraise?

Adzic: Appraise is the tool we built for MindMup, and ended up open-sourcing it so that other people struggling with the same problems could apply a similar approach.

Appraise takes the approach of specification by example, but applying it to visuals. We take concrete examples, push them through an automation layer to create executable specifications, and then use headless Chrome to take a screenshot clip and compare it with the expected outcome. Unlike the usual specification-by-example tools, where any difference between expected and actual results is wrong, Appraise takes the approval testing approach. It shows all differences to humans to review, and lets people decide if the difference is correct, and if so, approve it.

Approving a failing test case makes the actual result the expectation for the next test run, so test maintenance becomes much cheaper. Instead of hours of recalculating SVG positions, I can just push a button and make the new screenshot my new expected state. So we benefit from effective test design and automation typical for specification-by-example tools, but also from easy test-maintenance typical for approval testing. That means that we can do lots and lots of test cases quickly, and manage them easily.

Appraise can help teams review and approve changes to web pages, visual layouts and browser components quickly through visual inspection. It’s designed to automate acceptance/regression tests for visual look and feel in a visual language, rather than xUnit style code, so cross-functional groups of designers, testers, UX experts and product managers can collaborate on the tests.

In addition, by taking a specification-by-example approach, Appraise lets teams create easily maintainable/verifiable developer documentation with visual examples. The tool uses Markdown as the executable specification format, meaning that teams can easily publish their living documentation to Github or as a static HTML site.

Appraise is licensed under MIT, and the code is hosted on GitHub. The current state of the product is alpha-quality. What I mean by that is that it’s very useful for our use case, we use it in anger for testing MindMup look and feel, but it likely needs a bit of polish to be useful for other products and team workflows. It’s currently limited to JavaScript and running in Chrome, but it should be pretty easy to extend it for other browsers and execution runtimes.

InfoQ: How can testers do visual testing with Appraise?

Adzic: It’s not a tool primarily aimed at testers, but at cross-functional teams. People can start from visual examples using sketches, either hand-drawn, wireframe, or from a graphic tool, and then have a good discussion on related examples, boundary values and edge cases. This is where the true power of BDD and Specification by Example comes in, the discussions.

Those sketches then go into a markdown document to create an executable specification, that becomes a semi-automated test. Developers would then build the features and potentially some test fixtures, similar to FitNesse fixtures or Cucumber steps, to link the examples to the application.

Appraise will run the application, grab screenshots and clip them, then compare with the expected visuals. If everything matches, it won’t complain. If it spots any differences, it will show them to people for approval. Of course, for hand-drawn sketches the tests would initially fail, but we’ve made it very easy to compare differences, so a UX designer or a product expert could then review things and approve the ones that are correct -- this would become the baseline for the next run. For the incorrect ones, developers would be able to easily inspect what went wrong using typical developer tools in Chrome, and fix it up.

In this way, Appraise is designed to support cross-functional conversations and make it easy for people to spot and review differences.

Another tool that can be used for visual testing is Applitools Eyes. It uses artificial intelligence that mimics the human eye and brain. More information about a recent release can be found in Applitools Expands Application Visual Management Capabilities.

There are tools under development for visual testing, examples are Chromatic and Screener.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p