Transforming XML from one format to another is a common task for many developers. To do this, most of them leave the confines of their general purpose language and make calls to an XSLT library. But what if they didn't have to?
With LINQ to XML, it now becomes much easier to manipulate XML using C# and VB. Eric White describes how one can perform XSLT style transformations using C# 3.0.
The key to Eric's method is the ability to annotate XML nodes with additional information. Instead of altering the tree piece-meal, each pending change is stored as annotation against XElement it replaces. Eric writes
One advantage to taking this approach - as you formulate queries, you are always writing queries on the unmodified source tree. You need not worry about how modifications to the tree affect the queries that you are writing.
Once all of the pending changes have been generated, they are applied at one time. This is done via the XForm function, which creates a copy of the tree, making replacements where appropriate.
You can learn more about this technique and get a copy of the XForm function from Eric White's blog.
Community comments
Why use objects
by Frank Cohen,
Re: Why use objects
by Jonathan Allen,
this seems to be a step back ....
by Ke Jin,
Re: this seems to be a step back ....
by Ke Jin,
Re: this seems to be a step back ....
by Eric White,
Why use objects
by Frank Cohen,
Your message is awaiting moderation. Thank you for participating in the discussion.
Interesting move for the XLinq team. Anything that makes it easier to work with XML in an object oriented environment is a good thing. I did not find from Eric's blog if there is an impact on system performance at runtime with this approach. I investigated using XML and XQuery in an architecture I named FastSOA as a way to mitigate the performance problems I find in Java approaches to working with XML. My findings are in my book titled FastSOA from Morgan Kaufmann Publishers.
The basic problem I found was all the object instantiation needed to move from serialized XML to objects, to business logic operating on the data, and then back to XML or RDMS data formats. There's just too much object instantiation going on. I recommend using a domain specific language like XQuery because certain XQuery implementations compile XQuery to Java Byte Code and avoid objects entirely. (I also never did get my head into the non-proceduralness of XSLT, oh well.)
-Frank Cohen
www.pushtotest.com
Re: Why use objects
by Jonathan Allen,
Your message is awaiting moderation. Thank you for participating in the discussion.
As far as performance is concerned, it is way too early to tell. Because they are implemented so differently, I can easily see minor tweaks to a transformation shifting the advantage back and forth between the various techniques.
I think the real gain from this technique is that you don't have to marshall all the data you might need from C# to XQuery or XSLT. If your transformation needs to do something expensive like a database call or hard calculation one time out of ten, you only pay the cost when you actually need it.
On the other hand, XSLT is still a lot cleaner if you think in terms of "This is what my results are suppose to look like." rather than "This is how I transform my source.".
Needless to say, this is why I didn't conclude with "And this is why you should use X".
this seems to be a step back ....
by Ke Jin,
Your message is awaiting moderation. Thank you for participating in the discussion.
In java and C++ world, this has been a common and obvious practice for years. For simple transformations, we have DOM and SAX API to parse XML documents. Based on the parsing result, one can use pure imperative java and/or C++ code to generate DOM objects or simply output XML text streams without using XSLT transformer API and certainly without XSLT style sheets.
However, for generic applications that use XSLT today, declarative XSLT style sheet code are much cost effective to develop and maintain than imperative Java or C++ code. Also, XSLT is much easy to learn and use by non-programmers, namely business domain experts who have no skills on java or C++ programming, not to mention the DOM or SAX APIs.
Re: this seems to be a step back ....
by Ke Jin,
Your message is awaiting moderation. Thank you for participating in the discussion.
Also, declarative XSLT style sheets are much easy to be generated and verified by UI tools than imperative java/C++ or C# code.
Re: this seems to be a step back ....
by Eric White,
Your message is awaiting moderation. Thank you for participating in the discussion.
FWIW, I absolutely agree about the usage scenarios for XSLT. In the LINQ to XML documentation, I have at least 4 or 5 examples that show how to use XSLT to transform an XML tree. XSLT transforms create a new tree, so XSLT does not aleviate the problems of too many short-lived objects.
With respect to processor cost, I haven't done any metrics, however, when doing XSLT transforms using LINQ to XML, the XML tree has to be transformed into an XPathDocument internally, which is a big transform. Then XPath expressions have to be evaluated, and then the transform is effected. In contrast, the LINQ to XML queries that you use to add annotations are quite efficient (due to lazy evaluation of LINQ queries and the semantics of LINQ to XML axes), and adding annotations is cheap. I am going to bet that the pure LINQ to XML approach is more efficient. However, I'm not going to test this until my current deadlines are met :-)
One more note: this technique can expanded significantly. Possible improvements are:
- add modes, ala XSLT. Annotations are marked with modes. ApplyTransforms takes a mode as an attribute. The XForm function can also take a mode.
- allow for annotations on other types of nodes: attributes, text nodes, (and processing instructions and comments for completeness).
My only point about this post is that this is simply one approach to transforming XML trees when using LINQ to XML. It may be useful in some scenarios, but in other scenarios, XSLT may certainly be better.