Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Why Visual AI Beats Pixel and DOM Diffs for Web App Testing

Why Visual AI Beats Pixel and DOM Diffs for Web App Testing

Key Takeaways

  • Web application test automation depends on applying a test condition and measuring the application response.
  • Given the visual nature of web applications, testers require visual validation to automate web application tests.
  • For visual validation, Pixel and DOM differences suffer from false positives, for which structural or rendering differences have no visible difference for users.
  • Visual AI technology uses computer vision technologies to algorithmically compare screen regions, rather than individual pixels, for snapshot comparison.  
  • Visual AI technology creates repeatable visual validation for functional testing.

How do you automate visual testing of your web apps?

Visual testing means comparing the output of two test runs for visual differences. It’s necessary - so much can change between test runs - with the impact of HTML, CSS and JavaScript differences. Yet, many engineers believe that functional test automation is the best they can achieve.

Engineers complain that the visual testing tools they have tried don’t work with test automation.

Why is Visual Testing Necessary?

Check out this example from Instagram:

The code to handle this sponsored link passed all functional tests. And, if you inspect the code, all the images and all required strings are present, but they’re just in the wrong place, blocking revenue since no one would click an ad that looks like this.

Functional testing alone cannot help you find unexpected additions to your page. In the new version of the page as shown in the example below, terms and conditions get added at the bottom of the form. The new content requires a test for the new T&C external link (which may link to a new page, or to a hovered text box). If you forget to add a test for the new link, you are blind to its existence and blind to whether or not it behaves correctly.

How do you automate spotting visual differences? What technologies can help you?

Three Visual Test Technologies

Three technologies can provide visual comparisons between different runs of a web page. These are:

  • Pixel Comparisons - take a snapshot of the screen and compare it with the pixels on a previous version of the page, or a version of the page from a different browser.
  • DOM Comparisons - capture the DOM of the page and compare it with the DOM capture of a previous version of the page.
  • Visual AI - Capture the screen image, break it into visual elements using AI, compare the visual elements with an older screen image broken into visual elements using AI and identify visible differences.

Today’s Pixel and DOM comparison tools build on a few decades of numeric and text analysis tools. These tools report on a simple calculation. For pixel diff - with a given viewport size, have pixels changed, and, if so, where? For DOM diff - has the DOM changed, and, if so, where?

Each of these approaches has limitations, which we will describe below. Understand the bottom line limitations:

  • Browsers render HTML in pixels. Different browsers render pixels differently. Test engineers encounter obvious cross-browser differences, browser version rendering differences, and even rendering differences with viewport sizes. At the core is Is it possible to have pixel differences that users won’t identify as visual differences.
  • DOM snapshots do not equal visually identical browser outputs. Identical DOM structures can have different visual output. Different DOM outputs can render identically.  
  • A combination of Pixel and DOM diffs can mitigate some of these limitations (e.g. identify DOM differences that render identically), but are still suspect to a large number of false positive results.

In contrast, Visual AI uses computer vision technology that has been applied in everything from security systems to self-driving cars. Visual AI identifies visual elements that make up a screen or page capture. Rather than inspect pixels, Visual AI recognizes elements as elements with properties (dimension, color, text) and uses the properties of a checkpoint element to compare it to the baseline. The screens get compared at the element level, rather than the pixel level. DOM inspection helps Visual AI identify visual elements for comparison, but Visual AI ignores DOM differences.  With Visual AI, you discover visible differences and ignore trivial differences.

After reading this, you’ll wonder why you would ever consider something other than Visual AI.

The Problem with Pixels

What is pixel diffing or pixel comparison? They’re basically the same thing, and work like this...

Pixel diffs work by assuming pixels are identical.

If you test an old display with monospace characters, like the Raspberry Pi screen below, you could tell immediately whether pixels were on or off.

An "A" looks the same every time you capture that screen. If a pixel is in the wrong state, it’s obvious.

But you’re not running a terminal app. You’re using a graphical web app running in color with both text and images. You represent text in fonts that scale cleanly - meaning that your system uses anti-aliasing interpolation to render fonts. Your color images consist of pixels rendered by interpolated calculations in 24-bit color.

Font rendering can vary from browser to browser or version to version. Each rendering engine renders fonts slightly differently. And, the technology can influence a number of pixels. Check out this font rendering based on standard and subpixel rendering:

Source can be found here.

The image from The Ails of Typographic Anti-Aliasing, provides a useful read to understand why rendering differences exist between browsers or even browser versions.

How can pixel comparisons fail?

  • Font anti-aliasing
  • Image rescaling
  • Browser rendering
  • Graphic card rendering

No matter which system you use, pixel differences are possible - even likely.

For example:

Here is an example from Google Chrome. Chrome 67 rendered the image on the left. Chrome 68 rendered the image on the right. Same page. Same image. And, they look the same to the naked eye. But, between the versions, Chrome changed the rendering engine. To a pixel diff, these images are different.

Differences shown by a pixel comparator

Another kind of difference involves pixel displacement. Between one page and another, one item becomes larger - shifting the remaining pixels down on the page. The comparison shows the differences - so every shifted pixel now appears different.

Pixel diffs struggle with false positives - on image files, font anti-aliasing, and even a single change that displaces every following pixel on the page. And false positives can suck the joy out of any tester’s life, wasting time, effort, and energy on phantom issues. Any test system with too many false positives cannot get automated.

As an added complication, once the subsequent pixels differ, the reported differences can mask a real issue further down the page. Because all the pixels are different, a real difference gets lumped into the "different pixels" category. Pixel diff tools cannot discern further differences once a pixel reports as different.

You must also consider the issue of dynamic content. For example, what happens when you have a blinking cursor? What happens when you have a publication that updates regularly? How do you compare pages when the page is always changing?

Also, you must handle the issue of cross-browser testing. When you have the identical web page on two different browsers, you must account for the rendering engine each browser uses. What happens when the browsers render differently? What happens when you’re going from a desktop machine to a mobile device?

When you use a pixel diff tool, for each pixel difference, you must decide:

  1. What caused this difference?
  2. Are other differences embedded inside this difference?
  3. Does this difference require repair?
  4. How do I prioritize this difference?

Whether or not the differences impact a user, or result from rendering differences invisible to the user, these differences get reported, investigated, evaluated, and prioritized. That’s a lot of engineering effort for each reported difference. And, for false positives, it’s clearly a waste of time.

If you are looking to try a pixel comparator tool, imagine you get just 10 false positives per page. Think of the size of your web application, which can span multiple screens per page and multiple pages per application. How do you handle that workload?

The DOM Decision

What is DOM comparison or DOM diffing? This technique compares the Domain Object Model (DOM) representation of one web page with an earlier representation of that page. The underlying idea behind a DOM diff might seem like a brilliant idea. After all, a browser reads the HTML, CSS and JavaScript to build the DOM and render the page - so why not inspect the DOM and identify the differences?

Why not? The DOM is the DOM - not what the user sees. Your browser renders the page from the HTML, CSS and JavaScript, which comprises the DOM. So, the browser rendering matters. Can two different DOM structures render identical web pages? Yes.

Your DOM contains both rendered and non-rendered content. Not only that, you can change the DOM substantially - names, structures, XPaths, CSS, etc. - and still render the content identically. A simple page restructure renders your old DOM obsolete, and fills your DOM comparisons with correctly-identified differences that are false positives.

Similar to pixel comparison tools for each DOM difference, you must decide:

  1. What caused this difference?
  2. Are other differences embedded inside this difference?
  3. Does this difference require repair?
  4. How do I prioritize this difference?

People and companies that use DOM diffing do so with some automated infrastructure they use to build web pages and web applications. The more structured your app build process, the more you might be tempted to use a DOM diff for your app test technology. After all, how often do you make major changes to your build tool?

At the same time, the DOM diff remains oblivious to rendering changes. You may have changed a JPG file and kept the same name, and the DOM diff sees no difference - even though the user sees a difference on the rendered page.

Some other differences that A DOM diff misses:

  • IFrame changes but the filename stays the same
  • Broken embedded content
  • Cross-browser issues
  • Dynamic content behavior (DOM is static)

DOM comparators exhibit two clear deficiencies:

  1. Code can change and yet render identically, and the DOM comparator flags a false positive.
  2. Code can be identical and yet render differently, and the DOM comparator ignores the difference.

In short, DOM diffing ensures that the page structure remains the same from page to page. DOM comparisons on their own are insufficient for ensuring visual integrity.

Merging Pixel and DOM Comparators

There are also a  class of tools that use both Pixel and DOM comparison  technologies to provide visual checks.

Combined, these two technologies can assist each other in certain cases. For example:

  1. You have identical DOM structures and your pixel comparison shows small differences. Your tool might tell you that this difference is likely a small rendering difference because the DOM comparator reports the identical page structures.
  2. You have DOM differences but your pixel comparator shows only small pixel changes. This might overcome the false positive issue for the DOM difference.
  3. You have made changes to image files referenced by the DOM comparator. The DOM comparator shows no change, but the pixel comparator shows clear differences.

You likely realize that, together, these tools provide better results than either one of them alone.

Adding a visual check to a DOM comparator ensures that real changes don’t escape, and that visually identical output generated by different DOM structures don’t get flagged for inspection.

However this combined approach doesn’t solve the underlying problems. Your visual comparator is still at the pixel level. Pixel comparators still show rendering differences that may not be visually relevant. A DOM comparator might tell you nothing has changed, but you still need to inspect the differences.

So, what’s better than pixel comparators, DOM comparators, or a combination of pixel and DOM comparators?

Visual AI for Visual Testing

When a manual tester runs a web page test, she or he captures a screen after executing a step and inspects the page for visible differences versus an expected page. The test engineer identifies color, text, broken elements, broken links, and other issues that would affect a real user. Pixel-level differences may not matter.

While manual testers represent the user, manual testers have limitations. How consistently does a manual tester execute a test script? How accurately does a manual tester compare two different screen captures? What kind of throughput can a manual tester achieve with consistency? How about two different testers - do they come to the same conclusions consistently?

If we could automate a manual tester, we could get all the benefits of a manual tester - evaluating content, comparing visualizations, identifying problems - with none of the drawbacks.  And, if we could automate a manual tester, we would avoid the downsides of false positives that exist in pixel diffs, dom diffs, or the combination of tools.

How then do we automate a manual tester?

This question drove the development of Visual AI. Visual elements make up a page. Just as a self-driving car needs to make out stop lights, street signs, crosswalks, other cars, and pedestrians, Visual AI needs to identify text, text boxes, icons, menus, headers, and other elements on the page. Visual AI has the DOM to guide it, as the DOM represents what should be on the page - and where. Visual AI marries the presentation and the representation - repeatably.

Visual AI uses the DOM to identify the layout - location and spacing. Within the layout, Visual AI identifies elements algorithmically. For any checkpoint image compared against a baseline, Visual AI identifies all the layout structures and all the visual elements. Visual AI will identify differences in the layout, as well as differences within the visual elements contained within the layout. 

Each given page renders as a visual image comprised of visual elements. Visual AI treats elements as they appear:

  • Text, not a collection of pixels
  • Geometric elements (rectangles, circles), not a collection of pixels
  • Pictures as images, not a collection of pixels

At the micro level, visual AI avoids almost all of the false positives that plague pixel comparisons. Font anti-aliasing on different systems can result in different pixel values, but visual AI algorithmically determines that loops, angles, and density appears the same - even if the pixels are different.

At the macro level, visual AI can compare two images scaled using slightly different scaling algorithms and determine that they are, in fact the same image - even if the pixel values don’t line up identically.  Visual AI calculates that the two images to be visually identical, where pixel tools report visual differences that require investigation.

Internally, Visual AI depends on visual recognition algorithms.  These algorithms behave comparably to facial recognition and other machine vision tools. Unlike facial recognition - which looks for a pattern that could be anywhere in a pixel field, Visual AI looks for a visual match in a location determined by the DOM.

Using algorithmic comparison of the light and dark regions, colors, color density, and relationships of these regions, Visual AI can compare real images that render differently and determine whether they are identical.  This capability matters when you use real-world images that may resize or be represented differently by different browsers. Sometimes, even the same browser on different builds.

When Google upgraded from Chrome 67 to Chrome 68, the image rendering engine changed. To the naked eye, the images look identical. But, to the pixel diff, the images were substantially different. Remember - each pixel is a 24-bit representation of a color value. One bit could be different and the pixel has changed.

Chrome 67 and 68 render them differently on by pixel-by-pixel basis:

The pixel diff sees a large number of pixel differences. Why? Remember, a single pixel is a 24 bit representation - 8 bits each of red, blue and green. These must be identical for the pixel to have an identical value. Visual AI compares the images - proportions and variations - based on the border of the image inward. IN this case, the images are visually indistinguishable, even if the pixels are not.

Visual AI also helps with mobile apps as well as visual apps.  Here is a real-world pair of mobile app screens that we want to compare:

Okay, let’s spot the differences. Some are obvious:

  • The red bubble "Closed" on the left vs. the green bubble "Open" on the right
  • The text reading "Closed" across from Saturday-Sunday in the right screen, with nothing on the left.

When we do a pixel diff of these two pages we see the following:

These two mobile screens show many differences in magenta. The narrower line on the right shifted all pixels up the page - causing the pixel diff to highlight everything below the line as differences.

Here, the two images get compared by Visual AI. Visual AI can be configured to either pay attention to or ignore the effect of displacement. Here, with displacement effects accounted for, Visual AI can ignore everything except the changed parts of the page below. In this case, it sees the "Closed" next to "Saturday - Sunday".

Why Use Visual AI?

Most people care that their web app behaves well for users. Well-behaved means both functional behavior and visual behavior. Visual AI makes conclusions about visual behavior of an app, and also identifies significant differences that affect app behavior.

DOM-based tools don’t make visual evaluations. DOM-based tools identify DOM differences. These differences may or may not have visual implications. DOM-based tools result in false positives - differences that don’t matter but require human judgment to render a decision that the difference is unimportant. They also result in false negatives - they will pass something that is visually different, similar to the Instagram example above.

Pixel-based tools don’t make evaluations, either. Pixel based tools highlight pixel differences. They are liable to report false positives due to pixel differences on a page. In some cases, all the pixels shift due to an enlarged element at the beginning - pixel technology cannot distinguish the elements as elements. Pixel technology cannot see the forest from the trees.

Visual AI breaks regions of pixels into rendered elements for comparison purposes, similar to how humans view web pages. As a result, Visual AI can compare any kinds of images on a page.


First, functional testing requires visual validation - otherwise it’s myopic to changes you expect and blind to changes that you didn’t expect.

Second, the reason why you likely don’t do visual validation is that the tools you have today are not working for you due to  too many false positives. That’s because your technologies don’t see what your users see.  Comparing pixels wastes time because so many things can cause pixel differences. Comparing the DOM misses key content, because users see the rendered DOM, not the DOM representation. Compare Pixel and DOM - together they’re better, but still contain too many false positives.

Visual AI overcomes the problems of pixel and DOM for visual validation, and has enough accuracy to be used in production functional testing

Now is the time to investigate Visual AI for your web app testing.

About the Author

Michael Battat is a technologist, marketer, and test industry veteran. He blogs about testing and software test strategy for Applitools.  


Rate this Article