BT

Unit-Testing XML

Posted by Stefan Bodewig on Jun 11, 2007 |

There are many occasions where software creates XML output. XML documents are used for data interchange between different applications, web application create (X)HTML output or respond to AJAX requests using little XML snippets. There are many use cases where XML is generated and the outputs have to be tested as much as any other part of the application.

There are several approaches for testing the generated XML, and each of these approaches has its flaws when used in isolation.

For example you can:

  • validate the generated XML against a DTD, an XML Schema or any of the other grammar alternatives. Unfortunately such a grammar doesn’t always exist for your documents and even if it does, every test will only ever test the structure of the output, but not its contents.
  • Compare the generated output with an expected result. Unfortunately the serialized form of two XML documents that represent the same tree structure of information can be quite different. Elements without any child nodes may be collapsed to an empty element or still be serialized using an opening and a closing tag, there may be differences in whitespace or the character encoding, for example.
  • Use XPath queries to extract partial contents of the generated document and make assertions on their values. This can become tedious if the amount of generated content that needs to be tested is big.
  • Programmatically walk the document - for example using its DOM object model - and assert each node’s content. A test written this way is very specific and may require bigger adjustments when the output structure changes.

In addition the existing APIs for either task are often inconvenient. For example in Java prior to JAXP 1.3 (i.e. before Java SE 5), a document could only be validated against a DTD or an XML Schema while it was parsed from a byte stream or character stream into a DOM Document instance or a stream of SAX events.

XMLUnit

XMLUnit is an open source project licensed under the BSD license. It provides a small library of interrelated classes that simplify each of the different ways to test XML pieces that have been outlined in the previous section. Special APIs are provided to simplify writing unit tests with J/NUnit, but the library itself is fully usable without any testing framework at all.

There is a Java and a .NET version of XMLUnit, but the Java version is more mature and provides more features. This article will only focus on the Java version and all examples will use Java.

XMLUnit was founded by Tim Bacon and Jeff Martin in 2001 and developed as a testing framework for their own projects. The first stable release of XMLUnit for Java was released in March 2003. In the four years that followed this 1.0 release XMLUnit has been used in many open as well as closed source projects, but its active development stalled.

At the same time the XML ecosystem changed. XMLUnit 1.0’s validation classes strongly focus on DTDs, XML Schema was only supported as an afterthought. Likewise the simplistic XPath engine that was part of XMLUnit 1.0 didn’t support XML Namespaces at all.

In autumn 2006 development of XMLUnit has been picked up again, a first beta of XMLUnit 1.1 has been released in April 2007 and a final release is to be expected soon. There already are discussions for further development beyond this release on the XMLUnit mailing list.

The examples in the remainder of this article use XMLUnit 1.1, but many will apply to XMLUnit 1.0 as well.

Providing XML as Input to XMLUnit

XMLUnit’s APIs will accept “pieces of XML” as input using several different forms. In most cases they can be provided as InputStreams, Readers, Strings, InputSources or readily parsed DOM Document instances.

XMLUnit also provides a Transform class that can be used to apply an XSLT transformation to an existing input (using one of the formats provided above) and use the output of this transformation in further tests.

Transform tr = new Transform("",
new File("xml/example1.xsl"));
Document d = tr.getResultDocument();
assertEquals("example1", d.getDocumentElement().getTagName());

where the stylesheet consists of something like



Example 1: Using Transform to Test the Result of an XSLT Transformation

Validating XML

XMLUnit can validate an XML document against a DTD or W3C XML Schema. Later versions of XMLUnit will leverage the javax.xml.validation package added with JAXP 1.3 and thus potentially offer validation for RELAX NG, Schematron or other grammars as well.

For either form of validation XMLUnit’s Validator class is used.

Validating against a W3C XML Schema

Since DTD validation is XMLUnit’s default setting, Schema validation has to be explicitly enabled by setting Validator’s useXMLSchema attribute to true.

In order to validate against an XML Schema the document under test must declare an XML namespace using the Schema’s URI. The document can also provide a schemaLocation attribute which tells the XML parser where to find the Schema’s definition.


xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="file:///opt/schemas/example.com/order.xsd"/>

Example 2: An XML Document with namespace declaration and schemaLocation attribute

If no schemaLocation has been given, the XML parser will try to use the namespace’s URI as an URL and read the Schema’s definition from there.

Unfortunately there are often cases where it is not feasible to provide either a schemaLocation attribute nor use a valid URL as the namespace’s URI. The generated XML output may be processed on a different machine (like at a customer’s site) so that a local file reference wouldn’t work and this machine may be without network access so that any public http URL wouldn’t work either.

Fortunately JAXP 1.2 (i.e. Java 1.4) provides a well hidden feature that makes it possible to provide the schema location programmatically. The location(s) of the schema(s) can be specified as a File or a URL or even as a chunk of bytes.

String example = ""
+ ""
+ "
";
Validator v = new Validator(example);
v.useXMLSchema(true);
v.setJAXP12SchemaSource(new File("xml/example3.xsd"));
assertTrue(v.toString(), v.isValid());

Example 3: Validating an XML Document against a W3C XML Schema, Providing the schemaLocation programmatically

XMLUnit currently only supports validation of XML Schema instance documents but you can’t validate that the Schema definition itself is a valid XML Schema. There are plans to extends the support significantly in future versions.

Validating against a DTD

XMLUnit supports DTD validation in a number of different scenarios. In its most basic case the document under test contains a document type declaration that provides a SYSTEM Identifier.


		"file:///opt/schemas/example.com/order.xsd" >

Example 4: A XML Document with DOCTYPE declaration with SYSTEM and PUBLIC Identifiers

In this case the parser will locate the document using the given identifier.

For the same reasons as outlined in the XML Schema section this may not be desirable and so XMLUnit allows you to provide a SYSTEM identifier of your own. If you do so, it will even override an existing SYSTEM identifier. By specifying the SYSTEM identifier you can also validate documents that don’t contain any DOCTYPE declaration at all.

String example = "
      + " \"http://example.com/order\">"
      + ""
+ ""
+ "
";
Validator v = new Validator(example,
new File("xml/example5.dtd")
.toURI().toURL().toString());
assertTrue(v.toString(), v.isValid());

Example 5: Validating an XML Document against a DTD, Providing the location programmatically

As an alternative you can specify a SAX EntityResolver that will provide the location for the DTD. This can be used to defer the resolution to a OASIS Catalog using Apache’s XML Resolver library, for example.

String example = "
      + " \"http://example.com/order\">"
      + ""
+ ""
+ "
";
Validator v = new Validator(example);
XMLUnit.setControlEntityResolver(new CatalogResolver());
assertTrue(v.toString(), v.isValid());

with a catalog like


xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

uri="example5.dtd"/>

Example 6: Using an OASIS catalog to resolve the DTD location

Comparing Pieces of XML

When XMLUnit compares two pieces of XML the result can be one of three states:

  1. the two pieces of XML are identical
  2. the two pieces of XML are similar
  3. the two pieces of XML are different

XMLUnit classifies each kind of difference it detects as either recoverable or not (see below). The XML pieces are only identical if no differences have been found at all. If all differences that have been found are recoverable, the documents are said to be similar, otherwise they are different.

By default XMLUnit will consider only a few kinds of differences recoverable. For example, if the two documents use different prefixes for the same namespace they are considered to be similar but not identical. A full list of all detected differences can be found in XMLUnit’s user’s guide. One thing that may seem surprising is that XMLUnit considers two documents as similar if they contain the same elements in a different order.

String expected = "";
String actual = "";

Diff d = new Diff(expected, actual);
assertTrue(d.identical());

actual = "" + actual;
d = new Diff(expected, actual);
assertFalse(d.identical()); assertTrue(d.similar());

XMLAssert.assertXMLEqual(expected, actual);

Example 7: Comparing Two Pieces of XML

The last line in Example 7 shows the convenience method assertXMLEqual provided by the XMLAssert class. There are several overloaded methods for XML comparisons and the other XML test scenarios supported by XMLUnit that simplify the API for a combination of XMLUnit and JUnit 3.x even more. Note that “assertXMLEqual” is a bit of a misnomer since the method provides a test for similarity, not equality. assertXMLIdentical would fail if the two pieces of XML were similar but not identical.

XMLUnit provides several extension points that provide more control over the comparison’s outcome.

DifferenceListener

By providing an implementation of the DifferenceListener interface you can decide for yourself which type of difference is significant in your context. You may “upgrade” differences in element order to irrecoverable or choose to ignore differences in comments.

String expected = ""; 
String actual = "";

Diff d = new Diff(expected, actual);
assertFalse(d.similar());

d = new Diff(expected, actual);
d.overrideDifferenceListener(new DifferenceListener() {
 public int differenceFound(Difference difference) {
if (difference.getId()
== DifferenceConstants.COMMENT_VALUE_ID)
{
return RETURN_IGNORE_DIFFERENCE_NODES_IDENTICAL;
}
return RETURN_ACCEPT_DIFFERENCE;
 }
 public void skippedComparison(Node control, Node test)
});
assertTrue(d.identical());

Example 8a: Comparing Two Pieces of XML, Ignoring Differences in Comments

Since ignoring comments is such a common requirement XMLUnit provides a simple option to ignore them completely.

String expected = ""; 
String actual = "";

Diff d = new Diff(expected, actual);
assertFalse(d.similar());

XMLUnit.setIgnoreComments(true);
d = new Diff(expected, actual)
assertTrue(d.identical());

Example 8b: Comparing Two Pieces of XML, Ignoring Differences in Comments

ElementQualifier

Given that XMLUnit doesn’t consider the order of elements as significant it is not always obvious which children of a given node need to be compared to each other. By default XMLUnit will try to compare elements to each other that have the same tag name, but in some cases this may lead to undesirable results.


text
some other text

some other text
text

Example 9: When the Element’s Tag Name is Not Good Enough

In the example above the textual content of the elements needs to be used in addition to their tag names to pick the correct elements; this can be achieved using ElementNameAndTextQualifier.

String expected = ""
 + " text"
+ " some other text"
 + "
"; String actual = ""
 + " some other text"
+ " text"
+ "
"; Diff d = new Diff(expected, actual); assertFalse(d.similar()); d = new Diff(expected, actual); d.overrideElementQualifier(new ElementNameAndTextQualifier()) assertTrue(d.similar());

Example 10: Using ElementNameAndTextQualifier

ElementNameAndTextQualifier is one of several implementations of the ElementQualifier interface that is part of the XMLUnit distribution. In addition you can provide an implementation of your own if the logic that identifies comparable nodes is too specific.

DetailedDiff

The examples so far have only validated whether two pieces of XML are the same. Another use case when comparing two pieces of XML is to enumerate all differences between them. This is the task of DetailedDiff.

String expected = ""
 + " "
+ " text"
 + " some other text"
+ "
"; String actual = ""
+ " some other text"
+ " text"
+ "
"; DetailedDiff dd = new DetailedDiff(new Diff(expected, actual)); List l = dd.getAllDifferences(); for (Iterator i = l.iterator(); i.hasNext(); ) { Difference d = (Difference) i.next(); System.err.println(d); } assertEquals(6, l.size());

Example 11: Finding all Differences Between Two Pieces of XML

DetailedDiff is a subclass of Diff so it could be used to classify the two documents as similar or different as well. Unlike DetailedDiff Diff will stop the comparison process as soon as a non-recoverable difference has been encountered, though, so if you don’t need to find all the differences you should use Diff for improved performance.

Both Diff and DetailedDiff calculate the difference between two pieces of XML on demand, and cache the results. This means that you need to create a new Diff instance if you want to repeat a comparison using a different set of options.

More Configuration Options

Most of XMLUnit’s configuration is done via static methods of the XMLUnit class. Any changes of the default values will apply until the values are explicitly reset. If you are modifying the default settings in unit test cases it is good practice to reset them after each test (for example in the tearDown method if using JUnit 3.x) so that different tests don’t affect each other.

The option that you most probably want to change is handling of whitespace.

String expected = ""; 
String actual = "\n"
+ " \n"
+ "
";

Diff d = new Diff(expected, actual);
assertFalse(d.similar());

XMLUnit.setIgnoreWhitespace(true);
d = new Diff(expected, actual);
assertTrue(d.identical());

Example 12: Element Content Whitespace

In the example above the two pieces of XML would be considered different because the element in first one contains nested text (a newline character) while in the second it doesn’t.

By setting XMLUnit’s ignoreWhitespace property to true you can suppress the difference and the two documents would be considered identical.

Other options include ignoring comments or treating CDATA sections and “normal” nested text as one kind of content. I.e. in the example below both assertions will pass.

String expected = ""; 
String actual = "";
  
Diff d = new Diff(expected, actual); 
assertFalse(d.similar());
  
XMLUnit.setIgnoreDiffBetweenTextAndCDATA(true); 
d = new Diff(expected, actual) 
assertTrue(d.identical());

Example 13: Comparing CDATA Sections and “Normal” Text

XPath Tests

Traditionally XMLUnit used an XPath engine of its own that was based on XSLT. XMLUnit 1.1 will now favor JAXP 1.3’s javax.xml.xpath if it detects that it is available at runtime, but fall back to the internal one if it is not.

Regardless of which XPath engine is used under the covers, XMLUnit supports obtaining the result of applying an XPath expression to a piece of XML either as a DOM NodeList or as a String. In general, the latter form is more appropriate if you expect there to be only a single result and this result is the value of an attribute or nested element text.

XpathEngine eng = XMLUnit.newXpathEngine();

 String input = "";
 Document doc = XMLUnit.buildControlDocument(input);

assertEquals("1", eng.evaluate("/order/item[1]/@id", doc));
XMLAssert.assertXpathExists("/order/item[1]/@id", input);
XMLAssert.assertXpathEvaluatesTo("1", "/order/item[1]/@id", input);

assertEquals(2, eng.getMatchingNodes("/order/item", doc).getLength());

Example 14: Testing XPath Queries

XMLUnit 1.0’s XPath engine didn’t work properly on namespaced documents, in particular if a document contained several namespaces at once. XMLUnit 1.1 introduces the NamespaceContext interface and a simple Map based implementation that helps mapping prefixes to URLs.

String input = ""      
+ ""
+ "
";
Document doc = XMLUnit.buildControlDocument(input);

HashMap m = new HashMap();
m.put("x", "urn:order");
SimpleNamespaceContext ctx = new SimpleNamespaceContext(m);
XMLUnit.setXpathNamespaceContext(ctx);
XpathEngine eng = XMLUnit.newXpathEngine();

assertEquals("1", eng.evaluate("/x:order/x:item[1]/@id", doc));
XMLAssert.assertXpathExists("/x:order/x:item[1]/@id", input);
XMLAssert.assertXpathEvaluatesTo("1", "/x:order/x:item[1]/@id", input);

assertEquals(2, eng.getMatchingNodes("/x:order/x:item", doc).getLength());

Example 15: Testing XPath Queries on Namespaced Documents

When using NamespaceContext it is important to keep in mind that only the namespace’s URI is relevant, the prefix is not. The prefix provided in NamespaceContexts applies to the XPath selector, not to the document itself. Inside the document, the prefix will be ignored completely.

Programmatic Tests on DOM Trees

Occasionally the generated XML is very hard to test by comparing it with predefined results and testing individual nodes using XPath would become too convoluted because too many nodes would have to be tested individually.

For this situation XMLUnit provides a very powerful way of testing that lets you programmatically test each node of the generated XML using a simple interface.

In the following example the generated XML is supposed to contain a GUID (represented as eight hexadecimal numbers) in the id attribute of all item elements. The test verifies that the attributes’ values match the expected format and that the ids are unique for the generated document.

private class GuidTester extends AbstractNodeTester {
      private static final String pattern = "[0-9,a-f]{8}";  
      private Set visitedIds = new HashSet();  

      public void testElement(Element element) throws NodeTestException {
          if (element.getTagName().equals("item")) {  
              String idAttr = element.getAttribute("id");     
              if (!idAttr.matches(pattern)) {             
                  throw new NodeTestException("id attribute: " + idAttr                 
                                              + " is not in correct format");         
              }    
              if (visitedIds.contains(idAttr)) {     
                  throw new NodeTestException("id attribute: " + idAttr      
                                              + " is not unique");      
              }           
              visitedIds.add(idAttr);         
          }      
      }  
}    
public void testUniqueIds() throws Exception {      
     String works = ""          
+ ""
+ ""
+ "
";
NodeTest nt = new NodeTest(works);
nt.performTest(new GuidTester(), Node.ELEMENT_NODE);

String badPattern = ""
+ ""
+ "
";
nt = new NodeTest(badPattern);
try {
nt.performTest(new GuidTester(), Node.ELEMENT_NODE);
fail("expected exception");
} catch (NodeTestException ex) {
assertTrue(ex.getMessage().indexOf("format") > -1);
}

String notUnique = ""
+ ""
+ ""
+ "
";
nt = new NodeTest(notUnique);
try {
nt.performTest(new GuidTester(), Node.ELEMENT_NODE);
fail("expected exception");
} catch (NodeTestException ex) {
assertTrue(ex.getMessage().indexOf("not unique") > -1);
}
}

Example 16: Validating XML Documents Using NodeTester

Putting Things Together

Any software that creates XML should have tests for its output just like it needs tests for any other part of its functionality.

Testing XML can take several different approaches and sometimes a combination of more than one approach yields the best results. For simple cases comparing the generated output with an expected output probably is enough, for more complex cases formal validation of the output structure should be combined with content tests that involve either XPath queries (for small outputs) or programmatic tests.

The APIs for dealing with XML in Java are often inconvenient to use, XMLUnit provides a simplified API for all test approaches outlined in this article.

This article couldn’t cover all aspects of XMLUnit. For example there is support for HTML documents that are not well-formed XML via implementations of special DocumentBuilder and SAXParser classes. In addition to the XMLAssert class that is shown in some examples there also is a XMLTestCase class that extends JUnit’s TestCase and provides similar methods to XMLAssert.

You can learn more about XMLUnit at the project’s website and in its User’s Guide.

Stefan Bodewig is Chief Developer at WebOne Informatik GmbH in Essen, Germany, where he is responsible for the architecture and development of applications based on the Microsoft .NET platform. Stefan also is a contributor to several Open Source projects including XMLUnit and Apache Ant.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Is there an equivalent SoapUnit ? by anjan bacchu

Hi There/Stefan,

Seems to be an useful tool. I've used my home grown xml utilities for command line validation of xml files.

command line usage : I recently came across validate-xml.jar from woodstox "xml Processor" project. This seems to be a decent tool that points out where in an xml file a warning/error lies -- sort of like a compiler output.

It would be nice to have a good WebServices/SOAP Unit Testing tool for the java world. Is there one ? I will be working on a project where I will be developing(exposing) some web services as well as consuming some. It will be nice to have a tool that will test my webservices before the actual client interop tests. What will be nice is to have

a) a command-line client to test some basic webservices functionality like validity, list operations, etc

b) something like XMLUnit which helps in junit/TestNG unit tests.

What do people in the .NET world do ?

Thanks

BR,
~A

Re: Is there an equivalent SoapUnit ? by Steve Loughran

Anjan,
0. Ant has <schemavalidate> to check XSD files; Jing has a relax NG validation task.

1. You can use XMLUnit under TestNG as well as JUnit.

2. SOAP testing? Why would we need to test SOAP? I dont think its really time to improve the testing/debug facilities in SOAP land, because after so many years, you still end up using a TCP trace tool to work out why your messages arent being understood by the far end. Better to start working on the tools we need to make testing RESTy systems easier.

Sorry :)

-Steve

Steve Loughran,
Ant development team,
Author of Ant in Action
</schemavalidate>

Re: Is there an equivalent SoapUnit ? by Stefan Bodewig

as for SOAP specific testing tools, I'm not aware of any.

People in the .NET world use NUnit or MBUnit (very few may use Team Test) - together with XMLUnit for .NET, I hope.

Re: Is there an equivalent SoapUnit ? by Andreas A

soapUI worked pretty good for testing our webservices. Tests can be extended with groovy scripts for automatic input/output transitions between test steps

The listings are a bit cumbersome... by Krzysztof Witukiewicz

...because xml tags are interpreted by the browser :/ If anybody has better 'viewing experience', then please tell me what app do you use (I tried Firefox 2.0, IE6 and Opera 9.02)

Re: Is there an equivalent SoapUnit ? by Priyanka Grover

SoapUI is extremely good.

Re: The listings are a bit cumbersome... by Lars Huttar

I have the same question. I can get to the XML by using "View Source", but as you say it's cumbersome. Firefox 5.0 is not doing any better. Can't blame the browser... InfoQ is outputting the XML unescaped, as if it were part of the HTML. :-S

Could an editor please format the XML code properly?

Ignore special characters while comparing xmls by shireesh adla

Hi,

Not sure if this thread is still active.

iam using Xmlunit to compare xml strings,

Diff diff = new Diff(actualXML, expectedXML);

the comparison fails for similar xml's if they contain special characters in them (eg: both the xml's have "abc-xyz").

Is there any way to ignore special characters while comapring?.

Thanks in advance

Shireesh

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

8 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT